More PHP Internals: References

By request, a quick post on using PHP references in extensions.

To start, here’s an example of references in PHP we’ll be translating into C:

<?php
 
// just for displaying output
function display($x) {
    echo "x is $x\n";
}
 
// pass in an argument by making a copy of it
function not_by_ref($arg) {
    echo "called not_by_ref($arg)\n";
    $arg = 2;
}
 
// pass in an argument by reference
function by_ref(&$arg) {
    echo "called by_ref($arg)\n";
    $arg = 3;
}
 
$x = 1;
display($x);
 
not_by_ref($x);
display($x);
 
// when x is passed by reference, the function can change the value
by_ref($x);
display($x);
 
?>

This will print:

x is 1
called not_by_ref(1)
x is 1
called by_ref(1)
x is 3

If you want your C extension’s function to officially have a signature with ampersands in it, you have to declare to PHP that you want to pass in refs as arguments. Remember how we declared functions in this struct?

zend_function_entry rlyeh_functions[] = {
  PHP_FE(cthulhu, NULL)
  { NULL, NULL, NULL }
};

The second argument to PHP_FE, NULL, can optional be the argument spec. For example, let’s say we’re implementing by_ref() in C. We would add this to php_rlyeh.c:

// the 1 indicates pass-by-reference
ZEND_BEGIN_ARG_INFO(arginfo_by_ref, 1)
ZEND_END_ARG_INFO();
 
zend_function_entry rlyeh_functions[] = {
  PHP_FE(cthulhu, NULL)
  PHP_FE(by_ref, arginfo_by_ref)
  { NULL, NULL, NULL }
};
 
PHP_FUNCTION(by_ref) {
  zval *zptr = 0;
 
  if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "z", &zptr) == FAILURE) {
    return;
  }
 
  php_printf("called (the c version of) by_ref(%d)\n", (int)Z_LVAL_P(zptr));
  ZVAL_LONG(zptr, 3);
}

Suppose we also add not_by_ref(). This might look something like:

ZEND_BEGIN_ARG_INFO(arginfo_not_by_ref, 0)
ZEND_END_ARG_INFO();
 
zend_function_entry rlyeh_functions[] = {
  PHP_FE(cthulhu, NULL)
  PHP_FE(by_ref, arginfo_by_ref)
  PHP_FE(not_by_ref, arginfo_not_by_ref)
  { NULL, NULL, NULL }
};
 
PHP_FUNCTION(not_by_ref) {
  zval *zptr = 0, *copy = 0;
 
  if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "z", &zptr) == FAILURE) {
    return;
  }
 
  php_printf("called (the c version of) not_by_ref(%d)\n", (int)Z_LVAL_P(zptr));
  ZVAL_LONG(zptr, 2);
}

However, if we try running this, we’ll get:

x is 1
called (the c version of) not_by_ref(1)
x is 2
called (the c version of) by_ref(2)
x is 3

What happened? not_by_ref used our variable like a reference!

This is really weird and annoying behavior (if anyone knows why PHP does this, please comment below).

To work around it, if you want non-reference behavior, you have to manually make a copy of the argument.

Our not_by_ref() function becomes:

PHP_FUNCTION(not_by_ref) {
  zval *zptr = 0, *copy = 0;
 
  if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "z", &zptr) == FAILURE) {
    return;
  }
 
  // make a copy                                                                                                                                                          
  MAKE_STD_ZVAL(copy);
  memcpy(copy, zptr, sizeof(zval));
 
  // set refcount to 1, as we're only using "copy" in this function                                                                                                         
  Z_SET_REFCOUNT_P(copy, 1);
 
  php_printf("called (the c version of) not_by_ref(%d)\n", (int)Z_LVAL_P(copy));
  ZVAL_LONG(copy, 2);
 
  zval_ptr_dtor(&copy);
}

Note that we set the refcount of copy to 1. This is because the refcount for zptr is 2: 1 ref from the calling function + 1 ref from the not_by_ref function. However, we don’t want the copy of zptr to have a refcount of 2, because it’s only being used by the current function.

Also note that memcpy-ing the zval only works because this is a scalar: if this were an array or object, we’d have to use PHP API functions to make a deep copy of the original.

If we run our PHP program again, it gives us:

x is 1
called (the c version of) not_by_ref(1)
x is 1
called (the c version of) by_ref(1)
x is 3

Okay, this is pretty good… but we’re actually missing a case. What happens if we pass in a reference to not_by_ref()? In PHP, this looks like:

function not_by_ref($arg) {
   $arg = 2;
}
 
$x = 1;
not_by_ref(&$x);
display($x);

…which displays “x is 2”. Unfortunately, we’ve overridden this behavior in our not_by_ref() C function, so we have to special case: if this is a reference, change its value, otherwise make a copy and change the copy’s value.

PHP_FUNCTION(not_by_ref) {
  zval *zptr = 0, *copy = 0;
 
  if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "z", &zptr) == FAILURE) {
    return;
  }
 
  // NEW CODE
  if (Z_ISREF_P(zptr)) {
    // if this is a reference, make copy point to zptr
    copy = zptr;
 
    // adding a reference so we can indiscriminately delete copy later
    zval_add_ref(&zptr);
  }
  // OLD CODE
  else {
    // make a copy                                                                                                                                  
    MAKE_STD_ZVAL(copy);
    memcpy(copy, zptr, sizeof(zval));
 
    // set refcount to 1, as we're only using "copy" in this function                                                                                                       
    Z_SET_REFCOUNT_P(copy, 1);
  }
 
  php_printf("called (the c version of) not_by_ref(%d)\n", (int)Z_LVAL_P(copy));
  ZVAL_LONG(copy, 2);
 
  zval_ptr_dtor(&copy);
}

Now it’ll behave “properly.”

There may be a better way to do this, please leave a comment if you know of one. However, as far as I know, this is the only way to emulate the PHP reference behavior.

If you would like to read more about PHP references, Derick Rethans wrote a great article on it for PHP Architect.

  • AM

    i’ve read a book a year ago i think “Extending and Embedding php” . I’ll read it again, because i’m sure i found the answer to your question there, but i don’t remember now…..i’ll re-read it and post it!

  • am

    Found the answer here:
    http://www.php.net/manual/en/internals2.ze1.zendapi.php
    sections:
    Dealing with Arguments Passed by Reference and
    Assuring Write Safety for Other Parameters

    Basically because php uses references and copy on write, you have to isolate your value yourself when params are sent normaly (without php reference)

    Thanks for the great posts!
     

kristina chodorow's blog