The Truth About PHP Variables

elephant

I wanted to write this post to clear up what seems to be a common misunderstanding in PHP – that using references when passing around large variables is a good way save memory. To fully explain this I will need to explain how PHP handles variables internally. I hope that you will find this interesting and useful and that it helps dispel some myths around references and memory management in PHP. First off, lets cover the basics…

Basic References in PHP

(Note: If you are already familiar with references in PHP then feel free to skip this section)

In PHP it is possible to assign variables by value or by reference. The former method is the most common, and should look very familiar to you:

<?php
//Example 1: Assigning variables by value (the 'standard' way)
$var1 = 'hello!';
$var2 = $var1;
$var2 = 'goodbye!';
echo $var1; // Produces: hello!
echo "<br />\n";
echo $var2; //Produces: goodbye!
?>

This should be no surprise to you, just simple assigning of variables in PHP. The next example is very similar, but we assign $var2 by reference rather than by value.

<?php
//Example 2: Assigning variables by reference
$var1 = 'hello!';
$var2 =& $var1; // Notice the ampersand. This means $var2
                // is a reference to $var1
$var2 = 'goodbye!' // because $var2 is a reference to $var1,
                   // both variables now have the value 'goodbye!';
echo $var1; // Produces: goodbye!
echo "<br />\n";
echo $var2; //Produces: goodbye!
?>

This may be more surprising to some of you, so I will explain what is happening. The first step is no different, we simply initialise $var1 with a value of ‘hello!’. However, in the next step we assign $var1 to $var2 using the ‘=&’ operator, which causes a reference to $var1 to be passed, rather than the actual contents of $var1. This means that both variables point to the same data in memory, so any changes to either variable will affect the other.

For more information on this I would recommend reading the References Explained section of the PHP Manual as it covers this topic in much more detail.

How PHP Handles Variables Internally (using zvals!)

zorro-z

While the above explanation of references is sufficient for a general understanding, it is often useful to understand how PHP handles variable assignment internally. This is where we introduce the concept of the zval.

zvals are an internal PHP structure which are used for storing variables. Each zval contains various pieces of information, and the ones we will be focusing on here are as follows:

  • The actual data stored within the zval – In our example this would be either ‘hello!’ or ‘goodbye!’
  • is_ref Boolean flag
  • A ref_count counter

The zval also knows the type of data it contains, but this is not especially relevant here so it has been omitted from the above list.

The first item in our list, the actual data, does not require much explanation. The second item on this list (is_ref) indicates if variables should address this zval by value or by reference, the implications
of which are addressed shortly. The third item (ref_count) stores the number of variables that currently address this zval. If ref_count ever reaches zero (for example, if you call unset()) then PHP assumes that it can remove the zval and free up the memory it was using.

Now this bit is important: You may think that the ref_count value is only used when dealing with a reference (i.e. when is_ref=true), but this is not the case. The ref_count variable is used regardless of the value of is_ref. So what does this mean?

Being A Little Bit Clever

This is where, as the headline suggests, PHP is a little bit clever. When you assign a variable by value (such as in example 1) it does not create a new zval, it simply points both variables at the same zval and increases that zval’s ref_count by one. “Wait!” I hear you cry, “Isn’t that passing by reference?” Well, although it sounds the same, all PHP is doing is postponing any copying until it really has to – and it knows this because is_ref is still false. “Hum, so how does it work?” Ok, here is an example:

<?php
//Example 3a: Assigning variables by value (but with more detail)

//Here our zval is created for $var1.
$var1 = 'hello!';
//Our zval now has ref_count=1, is_ref=false

//We now assign $var1 to $var2
$var2 = $var1;
//Our zval now has ref_count=2, is_ref=false

debug_zval_dump($var2); //Produces: string(6) "hello!" refcount(3)
//(Why refcount(3)? See "An important note on debug_zval_dump()")

//We now assign a new value to $var2. So what happens to our zval?
$var2 = 'goodbye!';
//Read on to find out...

?>

An important note on debug_zval_dump(): php.net says this function “dumps a string representation of an internal Zend value to output.” This is true, but calling this function inherently causes another reference to the variable to be created, so you can (in these examples) subtract one from the ref_count value given in the output.

In the above example we see how both $var1 and $var2 refer to the same zval (as can be seen by the call to debug_zval_dump()). So what happens on the last line when we assign a new value to $var2? Does $var1 change too? Of course the answer is no, but why?

When we assign ‘goodbye!’ to $var2 in the example above, PHP examines the is_ref value of the underlying zval. If is_ref is false (as it is in this example) PHP knows that it can only change the value of the zval if the ref_count is 1 (as the change will not affect any other variables). However, in our example the ref_count is 2, therefore PHP realises that it is not allowed to change the zval’s value and so creates another zval to which $val2 is the associated. The is illustrated by the finished example below:

<?php
//Example 3b: Assigning variables by value (the complete example)

//Here our zval is created for $var1.
$var1 = 'hello!';
//Our zval now has value='hello!', ref_count=1, is_ref=false

//We now assign $var1 to $var2
$var2 = $var1;
//Our zval now has value='hello!', ref_count=2, is_ref=false

debug_zval_dump($var2); //Produces: string(6) "hello!" refcount(3)
//(Why refcount(3)? See "An important note on debug_zval_dump()")

//We now assign a new value to $var
$var2 = 'goodbye!';
//We now have two zvals:
//   The first: value='hello!', ref_count=1, is_ref=false
//   The second: value='goodbye!', ref_count=1, is_ref=false

?>

So we can see that, in the case of passing-by-value, PHP only copies data if a value is changed.

For the sake of completeness, here is an example where we pass-by-reference;

<?php
//Example 4: Assigning variables by value (the complete example)

//Here our zval is created for $var1.
$var1 = 'hello!';
//Our zval now has value='hello!', ref_count=1, is_ref=false

//We now assign $var1 to $var2
$var2 =& $var1;
//Our zval now has value='hello!', ref_count=2, is_ref=true

debug_zval_dump(&$var2); //Produces: &string(6) "hello!" refcount(3)
//(Why refcount(3)? See "An important note on debug_zval_dump()")

//We now assign a new value to $var
$var2 = 'goodbye!';
//We still have one zval, but with a
//new value: value='goodbye!', ref_count=2, is_ref=true

debug_zval_dump(&$var1); //Produces: &string(8) "goodbye!" refcount(3)
debug_zval_dump(&$var2); //Produces: &string(8) "goodbye!" refcount(3)
//(Why refcount(3)? See "An important note on debug_zval_dump()")

?>

As expected, we can see that the zval for both $var1 and $var2 has changed to a value of ‘goodbye!’ and has a ref_count of 2.

A Little More Complex

So now we know how PHP handles values and references, and isn’t it is all wonderfully exciting? “Oh yes! Please tell me more!” I hear you say? Ok then…

There is one last thing to mention in this area, which I think is especially relevant to those of you who love to (ahem) save memory by passing around references – what happens when values and references meet.

You may have noticed that the zval’s is_ref flag does not permit a zval to be both a reference and a value at the same time (as it is either true or false). On the face of it this is probably for the best as I suspect it could lead to all kinds of strangeness from an internal perspective. However, a result of this is that if you are using a variable by value in several places (i.e. the variables underlying zval has a ref_count greater than 1) and then pass it by reference (for example, to a function), PHP will have to copy the value into a entirely new zval in order to set the is_ref flag to true. The following example illustrates how this can result in substantially increased memory usage:

<?php
//Example 5: Showing how mixing references and values can lead
//           to increased memory consumption

memory_show_usage(); //Zero bytes

$v1 = str_repeat('0', 100000);//Generate 100kb of dummy data
memory_show_usage(); //100kb

$v2 = $v1;
//We now have two variables pointing to a zval in the form:
//   is_ref=false, ref_count=2
memory_show_usage(); //100kb

$r1 =& $v2; //We now assign our value by reference
memory_show_usage(); //200kb
//PHP has now had to create a second zval in the form:
//   is_ref=true, ref_count=1

$v3 = $r1; //We now assign second zval by value
memory_show_usage(); //300kb
//PHP has now had to create a third zval in the form:
//   is_ref=false, ref_count=1

$v4 = $v3; //Now assign by value
memory_show_usage(); //300kb (no increase)
//Our third zval now has a ref_count of 2

//Both $v3 and $v4 now have the same zval, which may only be
//passed by value as it has a ref_count greater than one

$r2 =& $v3; //So now we assign $v3 by reference
memory_show_usage(); //400kb
//Here PHP has been forced to create a fourth zval with yet
//another copy of the data. The new zval is in the form:
//    is_ref=true, ref_count = 1

//Simple function to show memory use from a baseline
function memory_show_usage(){
    static $baseline = null;
    if(is_null($baseline)){
        //Initialise to get an accurate memory use value
        $baseline = 1;
        $baseline = memory_get_usage();
    }

    echo (memory_get_usage() - $baseline) . " bytes\n";
}

?>

Although this example only assigns variables directly, the same principles apply when performing function calls where parameters are passed by reference. You can see that, unless the developer is completely consistent, passing variables by reference can easily lead to increased memory usage.

Conclusion

conclusion

If you concern is to conserve memory then it is best to simply pass data by value as the PHP language is smart enough to conserve memory automatically. If you really must pass a value by reference then make sure that it is done consistently as this will avoid consuming many times more memory (and CPU cycles) than is necessary. Alternatively you could wrap your data in an object as PHP5 (but not PHP4) will pass this by reference as the default behaviour.

As a side note I would like to point out that side affecting function parameters (which may be your intention if you are passing by reference) is generally discouraged as it can make some bugs very hard to track down (a similar argument to that against global variables).

Further Reading

References in PHP: An In-depth look (PDF) – An excellent article by Derek Rethans for the PHP Architect magazine.

References Explained – The official explanation of PHP references.

debug_zval_dump() – Documentation for the (sometimes unexpected) workings of this function.

PHP Internals Mailing List (Archive) – I highly recommend reading this list to any professional developer.

6 comments so far

  1. Brad on

    Wow, I suppose the tutorial is nice. Not what I had been searching for anyway. But, I was surprised to see that you are based in England because there weren’t any tell tale vocabulary choices. For what it is worth, which may not be very much…

    • adamcharnock on

      Hi Brad. Thanks, I’m pleased you liked it :) I try to keep the language reasonably neutral where I can, as well as avoiding topics like cricket, cucumber sandwiches, and the queen :p

      I am trying to cook up some more blog posts at the moment, so let me know if there is anything you want to hear about!

  2. Marco on

    Thank you for your article. I was debating on how to load language data from a flat file, and had been trying to decide how to store the variables. Would I just use constants, normal variables, an array, a class? Well, I ended up using – functions.

    I ran your memory test on loading up say 10 variables full of text, along with loading up 10 functions that simply return the text, and I was shocked to see the difference in memory usage. I just write it like function MyLangString(){ return “My welcome string in English, etc.”; } Then just call the function when I need to either display it or pass it to a variable for later display.

    It’s also handy for passing variables that are in the body of my text – such as a first name, or a website name. function MyLangString(){ return “Hello $name, welcome string in English, etc.”; }
    echo MyLangString(“Marco”);

  3. Marco on

    The function should be:
    MyLangString($name){ return “Hello $name, welcome string in English, etc.”; }

  4. Rich on

    Wow, thanks so much. I’ve been doing lots of that “ahem” saving memory. Doh! Very enlightenting and excellently written. Thank you.

  5. styx on

    The results from the last example are very interesting. I didn’t know about such a memory increase when using address. Thanks for the post.


Leave a reply