Effective In-Function Caching With PHP5

tin-cans

At one stage or another most programmers have written some simple in-function caching. If you don’t know what I mean my in-function caching, here is a simple example:


<?php
function getUserId($username){
	static $cache = array();
	if(!isset($cache[$username])){
		//Some DB intensive code for getting the user ID
		//and putting it into $userId;

		$cache[$username] = $userId;
	}

	return $cache[$username];
}
?>

So you can call this method several times for the same username but only have to talk to the database once.

This method generally works well in situations where:

  1. The data you are reading is not going to be changing. If the data can change then it is possible that the cache will end up not being an accurate representation of the real data.
  2. You are only calling the function with a small number of values. If you were calling the function for a wide range of values then the cache could begin to consume a large amount of memory.

To illustrate the first point, imagine that you called getUserId() for the user ‘adam’. You then delete the user ‘adam’. You then call getUserId() (again with the username of ‘adam’) only to be returned with the cached user ID rather than a value of false (or an error).

I recently encountered the second point in the above list as I am in the process of writing a library that (among other things) parses Apache HTTPD log files into a database table. For this I had to convert IP addresses into country codes as well as parse user-agent strings for hundreds of thousands of records. Clearly caching would be useful here, but I could not use the above method as the script would quickly gobble up more and more memory.

A Class for In-Function Caching

In response to this problem I created the following PriorityCache class:

<?php

class Stapi_Utils_PriorityCache {

	/**
	 * The maximum size of the cache
	 * @var int
	 */
	protected $maxSize;
	/**
	 * An associative array of our cached data
	 * @var array
	 */
	protected $data = array();
	/**
	 * An indexed array of this cache's dependencies
	 * @var array
	 */
	protected $dependencies = array();
	/**
	 * An array of all the caches produced by factory()
	 * @var array
	 */
	protected static $caches;

	/**
	 * Generate a new PriorityCache object
	 *
	 * @param int $maxSize The maximum number of records this cache should store
	 * @param array An array of dependencies for the new cache
	 * @return PriorityCache
	 */
	public static function factory($maxSize = 100, $dependencies = array()){
		self::$caches = array();
		$newCache = new Stapi_Utils_PriorityCache($maxSize, $dependencies);
		self::$caches[] = $newCache;
		return $newCache;
	}

	/**
	 * Reset all the caches which are dependant upon $dependencyName
	 *
	 * @param string $dependencyName
	 */
	public static function resetDependants($dependencyName){
		foreach (self::$caches as $cache){
			if($cache->hasDependency($dependencyName)){
				$cache->reset();
			}
		}
	}

	/**
	 * Construct a new PriorityCache object.
	 *
	 * You cannot not instantiate this class directory. Use
	 * the factory() method instead
	 *
	 * @param int $maxSize The maximum number of records this cache should store
	 * @param array An array of dependencies for the new cache
	 */
	protected function __construct($maxSize, $dependencies){
		$this->maxSize = $maxSize;
		$this->dependencies = $dependencies;
	}

	/**
	 * Put an entry to the top of the cache
	 *
	 * @param string $key The key of the cache entry
	 * @return mixed The value for $key
	 */
	protected function touchValue($key){
		$value = $this->data[$key];
		unset($this->data[$key]);
		$this->data = array($key => $value) + $this->data;
		return $value;
	}

	/**
	 * Get a value from the cache
	 *
	 * @param string $key The key of the cache entry
	 * @return mixed The value for $key, or null if the key was not found
	 */
	public function getValue($key){
		if(isset($this->data[$key])){
			return $this->touchValue($key);
		}else{
			return null;
		}
	}

	/**
	 * Set a value in the cache
	 *
	 * This will also cause the cache entry to be touched
	 * (even if it already exists). This will cause an
	 * old cache entry to be deleted if the cache is full
	 * and if this is a new value.
	 *
	 * @param string $key The key of the cache entry
	 * @param mixed $value The value of the cache entry
	 * @return mixed Returns $value
	 */
	public function setValue($key, $value){
		$this->data[$key] = $value;
		$this->touchValue($key);
		if(count($this->data) > $this->maxSize){
			array_pop($this->data);
		}
		return $value;
	}

	/**
	 * Determine if the specified key exists in the cache
	 *
	 * @param string $key
	 * @return boolean
	 */
	public function keyExists($key){
		return isset($this->data[$key]);
	}

	/**
	 * Returns true if the cache is dependent upon $dependencyName
	 *
	 * @return boolean
	 */
	public function hasDependency($dependencyName){
		return in_array($dependencyName, $this->dependencies);
	}

	/**
	 * Empties the cache
	 */
	public function reset(){
		$this->data = array();
	}
}

?>

You can see that this cache addresses both of the problems with our simplistic example:

  1. You give your new cache a series of arbitrary string dependences. In our example we may have used the string ‘table:users’, and then whenever we modified the users table we could call Stapi_Utils_PriorityCache::resetDependants('table:users');
  2. The cache will start dropping its oldest values once it fills up. This means we can now use our cache for a wide range of values without worrying about rampaging memory usage.

There are several ways in which you could extend this class. For example, you could probably save memory by passing all key values through md5() (or even better, crc32()). You could also look into making the cashed data persistent across requests, but if you need this level of caching then you should probably look at Zend_Cache.

I will leave you with an example of using the above class with a trivial getMd5 function:

<?php

require_once 'PriorityCache.php';

for ($i=1; $i<250; $i++){
	//15 possible values for $sampleValue (A to O)
	$sampleValue = chr(rand(65, 79));

	getMd5($sampleValue); //Call our function

	//Reset the cache every 50 itterations
	if($i % 50 == 0){
		Stapi_Utils_PriorityCache::resetDependants('test-dep');
		echo "\n";
	}
}

function getMd5($string){
	static $cache = null;

	//We need to initialise the cache on the first call
	if(is_null($cache)){
		$cache = Stapi_Utils_PriorityCache::factory(25, array('test-dep'));
	//See if we have the value cached. If so, return it
	} else if ($cachedValue = $cache->getValue($string)) {
		echo '.';
		return $cachedValue;
	}

	//Otherwise we need to calculate the value
	echo '!';
	$value = md5($string);

	//And cache/return it (setValue() returns $value to make our code shorter)
	return $cache->setValue($string, $value);
}

?>

No comments yet

Leave a reply