The Truth About PHP Variables
I wanted to write this post to clear up what seems to be a
common misunderstanding in PHP - that using references when passing around
large variables is a good way save memory. To fully explain this I will need to
explain how PHP handles variables internally. I hope that you will find this
interesting and useful and that it helps dispel some myths around references
and memory management in PHP. First off, lets cover the basics...
Basic References in PHP
(Note: If you are already familiar with references in PHP
then feel free to skip this section)
In PHP it is possible to assign variables by value or by
reference. The former method is the most common, and should look very familiar to
you:
This should be no surprise to you, just simple assigning of
variables in PHP. The next example is very similar, but we assign $var2 by
reference rather than by value.
This may be more surprising to some of you, so I will
explain what is happening. The first step is no different, we simply initialise
$var1 with a value of 'hello!'. However, in the next step we assign $var1 to $var2
using the '=&' operator, which causes a reference
to $var1 to be passed, rather than the actual contents of $var1. This means
that both variables point to the same data in memory, so any changes to either
variable will affect the other.
For more information on this I would recommend reading the References Explained
section of the PHP Manual as
it covers this topic in much more detail.
How PHP Handles Variables Internally (using zvals!)
While the above explanation of references is sufficient for
a general understanding, it is often useful to understand how PHP handles
variable assignment internally. This is where we introduce the concept of the zval.
zvals are an internal PHP structure which are used for
storing variables. Each zval contains various pieces of information, and the
ones we will be focusing on here are as follows:
- The actual data stored within the zval - In our example this would be either 'hello!' or 'goodbye!'
- is_ref Boolean flag
- A ref_count counter
The zval also knows the type of data it contains, but this
is not especially relevant here so it has been omitted from the above list.
The first item in our list, the actual data, does not
require much explanation. The second item on this list (is_ref) indicates if
variables should address this zval by value or by reference, the implications
of which are addressed shortly. The third item (ref_count) stores the number of
variables that currently address this zval. If ref_count ever reaches zero (for
example, if you call unset()) then PHP assumes that it can remove the zval and
free up the memory it was using.
Now this bit is
important: You may think that the ref_count value is only used when dealing
with a reference (i.e. when is_ref=true), but
this is not the case. The ref_count variable is used regardless of the
value of is_ref. So what does this mean?
Being A Little Bit Clever
This is where, as the headline suggests, PHP is a little bit
clever. When you assign a variable by value (such as in example 1) it does not
create a new zval, it simply points both variables at the same zval and
increases that zval's ref_count by one. "Wait!" I hear you cry, "Isn't that
passing by reference?" Well, although it
sounds the same, all PHP is doing is postponing any copying until it really has
to - and it knows this because is_ref is still false. "Hum, so how does it
work?" Ok, here is an example:
An important note on debug_zval_dump(): php.net says this function
"dumps a string representation of an internal Zend value to output." This is
true, but calling this function inherently causes another reference to the
variable to be created, so you can (in these examples) subtract one from the
ref_count value given in the output.
In the above example we see how both $var1 and $var2 refer
to the same zval (as can be seen by the call to debug_zval_dump()). So what happens on the
last line when we assign a new value to $var2? Does $var1 change too? Of course
the answer is no, but why?
When we assign 'goodbye!' to $var2 in the example above, PHP
examines the is_ref value of the underlying zval. If is_ref is false (as it is
in this example) PHP knows that it can only change the value of the zval if the
ref_count is 1 (as the change will not affect any other variables). However, in
our example the ref_count is 2, therefore PHP realises that it is not allowed
to change the zval's value and so creates
another zval to which $val2 is the associated. The is illustrated by the finished
example below:
So we can see that, in the case of passing-by-value, PHP
only copies data if a value is changed.
For the sake of completeness, here is an example where we
pass-by-reference;
As expected, we can see that the zval for both $var1 and $var2
has changed to a value of 'goodbye!' and has a ref_count of 2.
A Little More Complex
So now we know how PHP handles values and references, and
isn't it is all wonderfully exciting? "Oh yes! Please tell me more!" I hear you
say? Ok then...
There is one last thing to mention in this area, which I
think is especially relevant to those of you who love to (ahem) save memory by
passing around references - what happens when values and references meet.
You may have noticed that the zval's is_ref flag does not permit a zval to be both a
reference and a value at the same time (as it is either true or false). On the
face of it this is probably for the best as I suspect it could lead to all
kinds of strangeness from an internal perspective. However, a result of this is
that if you are using a variable by value in several places (i.e. the variables
underlying zval has a ref_count greater than 1) and then pass it by reference (for
example, to a function), PHP will have
to copy the value into a entirely new zval in order to set the is_ref flag
to true. The following example illustrates how this can result in substantially
increased memory usage:
Although this example only assigns variables directly, the
same principles apply when performing function calls where parameters are
passed by reference. You can see that, unless the developer is completely
consistent, passing variables by reference can easily lead to increased memory
usage.
Conclusion
If you concern is to conserve memory then it is best to
simply pass data by value as the PHP language is smart enough to conserve
memory automatically. If you really must pass a value by reference then make sure
that it is done consistently as this will avoid consuming many times more
memory (and CPU cycles) than is necessary. Alternatively you could wrap your
data in an object as PHP5 (but not PHP4) will pass this by reference as the
default behaviour.
As a side note I would like to point out that side affecting
function parameters (which may be your intention if you are passing by
reference) is generally discouraged as it can make some bugs very hard to track
down (a similar argument to that against global variables).
Further Reading
References in PHP: An In-depth look (PDF) -
An excellent article by Derek Rethans for the
PHP Architect magazine.
References Explained -
The official explanation of PHP references.
debug_zval_dump() -
Documentation for the (sometimes unexpected) workings of this function.
PHP Internals Mailing List
(Archive) -
I highly recommend reading this list to any professional developer.
Trackbacks