More PHP Internals: References

By request, a quick post on using PHP references in extensions.

To start, here’s an example of references in PHP we’ll be translating into C:

<?php
 
// just for displaying output
function display($x) {
    echo "x is $x\n";
}
 
// pass in an argument by making a copy of it
function not_by_ref($arg) {
    echo "called not_by_ref($arg)\n";
    $arg = 2;
}
 
// pass in an argument by reference
function by_ref(&$arg) {
    echo "called by_ref($arg)\n";
    $arg = 3;
}
 
$x = 1;
display($x);
 
not_by_ref($x);
display($x);
 
// when x is passed by reference, the function can change the value
by_ref($x);
display($x);
 
?>

This will print:

x is 1
called not_by_ref(1)
x is 1
called by_ref(1)
x is 3

If you want your C extension’s function to officially have a signature with ampersands in it, you have to declare to PHP that you want to pass in refs as arguments. Remember how we declared functions in this struct?

zend_function_entry rlyeh_functions[] = {
  PHP_FE(cthulhu, NULL)
  { NULL, NULL, NULL }
};

The second argument to PHP_FE, NULL, can optional be the argument spec. For example, let’s say we’re implementing by_ref() in C. We would add this to php_rlyeh.c:

// the 1 indicates pass-by-reference
ZEND_BEGIN_ARG_INFO(arginfo_by_ref, 1)
ZEND_END_ARG_INFO();
 
zend_function_entry rlyeh_functions[] = {
  PHP_FE(cthulhu, NULL)
  PHP_FE(by_ref, arginfo_by_ref)
  { NULL, NULL, NULL }
};
 
PHP_FUNCTION(by_ref) {
  zval *zptr = 0;
 
  if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "z", &zptr) == FAILURE) {
    return;
  }
 
  php_printf("called (the c version of) by_ref(%d)\n", (int)Z_LVAL_P(zptr));
  ZVAL_LONG(zptr, 3);
}

Suppose we also add not_by_ref(). This might look something like:

ZEND_BEGIN_ARG_INFO(arginfo_not_by_ref, 0)
ZEND_END_ARG_INFO();
 
zend_function_entry rlyeh_functions[] = {
  PHP_FE(cthulhu, NULL)
  PHP_FE(by_ref, arginfo_by_ref)
  PHP_FE(not_by_ref, arginfo_not_by_ref)
  { NULL, NULL, NULL }
};
 
PHP_FUNCTION(not_by_ref) {
  zval *zptr = 0, *copy = 0;
 
  if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "z", &zptr) == FAILURE) {
    return;
  }
 
  php_printf("called (the c version of) not_by_ref(%d)\n", (int)Z_LVAL_P(zptr));
  ZVAL_LONG(zptr, 2);
}

However, if we try running this, we’ll get:

x is 1
called (the c version of) not_by_ref(1)
x is 2
called (the c version of) by_ref(2)
x is 3

What happened? not_by_ref used our variable like a reference!

This is really weird and annoying behavior (if anyone knows why PHP does this, please comment below).

To work around it, if you want non-reference behavior, you have to manually make a copy of the argument.

Our not_by_ref() function becomes:

PHP_FUNCTION(not_by_ref) {
  zval *zptr = 0, *copy = 0;
 
  if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "z", &zptr) == FAILURE) {
    return;
  }
 
  // make a copy                                                                                                                                                          
  MAKE_STD_ZVAL(copy);
  memcpy(copy, zptr, sizeof(zval));
 
  // set refcount to 1, as we're only using "copy" in this function                                                                                                         
  Z_SET_REFCOUNT_P(copy, 1);
 
  php_printf("called (the c version of) not_by_ref(%d)\n", (int)Z_LVAL_P(copy));
  ZVAL_LONG(copy, 2);
 
  zval_ptr_dtor(&copy);
}

Note that we set the refcount of copy to 1. This is because the refcount for zptr is 2: 1 ref from the calling function + 1 ref from the not_by_ref function. However, we don’t want the copy of zptr to have a refcount of 2, because it’s only being used by the current function.

Also note that memcpy-ing the zval only works because this is a scalar: if this were an array or object, we’d have to use PHP API functions to make a deep copy of the original.

If we run our PHP program again, it gives us:

x is 1
called (the c version of) not_by_ref(1)
x is 1
called (the c version of) by_ref(1)
x is 3

Okay, this is pretty good… but we’re actually missing a case. What happens if we pass in a reference to not_by_ref()? In PHP, this looks like:

function not_by_ref($arg) {
   $arg = 2;
}
 
$x = 1;
not_by_ref(&$x);
display($x);

…which displays “x is 2”. Unfortunately, we’ve overridden this behavior in our not_by_ref() C function, so we have to special case: if this is a reference, change its value, otherwise make a copy and change the copy’s value.

PHP_FUNCTION(not_by_ref) {
  zval *zptr = 0, *copy = 0;
 
  if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "z", &zptr) == FAILURE) {
    return;
  }
 
  // NEW CODE
  if (Z_ISREF_P(zptr)) {
    // if this is a reference, make copy point to zptr
    copy = zptr;
 
    // adding a reference so we can indiscriminately delete copy later
    zval_add_ref(&zptr);
  }
  // OLD CODE
  else {
    // make a copy                                                                                                                                  
    MAKE_STD_ZVAL(copy);
    memcpy(copy, zptr, sizeof(zval));
 
    // set refcount to 1, as we're only using "copy" in this function                                                                                                       
    Z_SET_REFCOUNT_P(copy, 1);
  }
 
  php_printf("called (the c version of) not_by_ref(%d)\n", (int)Z_LVAL_P(copy));
  ZVAL_LONG(copy, 2);
 
  zval_ptr_dtor(&copy);
}

Now it’ll behave “properly.”

There may be a better way to do this, please leave a comment if you know of one. However, as far as I know, this is the only way to emulate the PHP reference behavior.

If you would like to read more about PHP references, Derick Rethans wrote a great article on it for PHP Architect.

kristina chodorow's blog