Multilevel Caching Strategy
Posted by John Kleijn • Saturday, May 24. 2008 • Category: PHPWhether you're running your application as daemon or your using a caching server, homebrew (in that case you probably want to use the shmop extension, which will save you the trouble of creating a PHP script daemon) or third party such as Memcached, there is a limit to the amount of memory you can use.
You may want to cache some items practically indefinitely, like keeping a visitor's session for 6 months. Keeping the session in memory for that time is of course ludicrous. Also, should your daemon handling caching crash, or need a reboot, bye bye data. The solution to this is to implement multilevel caching strategy.
Take a look at the following sequence diagram, which illustrates getting an (old) item from cache.

If you know your Design Patterns, you'll recognize a tiny Chain of Responsibility pattern in this example. In essence, we compose a chain of two objects, breaking on the first object that can handle the request. If you look near the bottom of the diagram, you'll notice that after retrieving an item, it is moved to the top of the chain. This done under the assumption that the item will be used more often in this time frame. The item is placed at the top of the stack of the first cache, and will continue to placed there by the top level cache, each time it gets accessed. The stack has a fixed length, so adding new items to it will pop off the least frequently used. The handler will then put it in the next cache in the chain. This is a crude implementation of the LFU (Least Frequently Used) strategy, which doesn't work.
Hold on, did he say "which doesn't work"? Yes, I did. The problem with this approach is that once the top level is full, every new addition to it will cause both a get and a put operation. With a three level (or more) cache, you risk cascading put operations. There is no solution to this in a single-threaded environment (that I can think of), other than being lenient with the amount of space the top level cache can use, and let garbage collection 'enforce' the limit. That means we no longer have a hard limit, but it also means our caching mechanism is as fast as it possibly can be. It's still risky, especially running an application daemon. The following implementation should be regarded as 'food for thought', nothing more, nothing less.
To illustrate the massive difference between caching in memory and caching in files, here are some test results of the two:
Average time to complete 'get' (500 subsequent tests)
Kept in memory: 0.00009136 - 100%
File based: 0.00022914 - 250,81%
Note that in this test, there is only a little a overhead from unserializing with file based caching. Both classes use a simple meta object to store caching data, the test data is just a simple integer. In a real life scenario, the data itself has to be serialized too. So the difference is likely to be even more. Also note that when using a 'remote' caching daemon, the difference is likely to be less, since it too has to serialize its data, and has network overhead.
I found it pretty hard to produce some solid test results using different methods of fetching items (see 'get' method in Cache supertype). But I can say that all methods produced a result a little slower than 'memory only' caching, yet considerably faster than 'file only' caching. The results ranged between 0.00009 and 0.000017, not including garbage collection. The higher numbers are a result of items being 'pulled'.
A WORD OF WARNING: I'm about to share some hardly tested (and not fully documented) code with you. It could very well fail horribly under the wrong conditions
Both classes have the same superclass, here it is:
/**
* Backbone_Cache_Abstract
*
* Abstract class for cache implementations.
*
* Implements the Chain of Responsibility pattern.
*
* @package Backbone_Cache
*
*/
abstract class Backbone_Cache_Abstract {
/**
* Next level caching object
*
* @var Backbone_Cache_Abstract
*/
private $successor;
/**
* Add successor.
*
* @param Backbone_Cache_Abstract $cache
*/
public function addSuccessor(Backbone_Cache_Abstract $cache)
{
$this->successor = $cache;
}
/**
* Get string from cache by id
*
* @param scalar $id
* @param bool $pull Whether to set an item retrieved from a successor on this cache.
*/
public function get($id, $pull = true, $softPull = false)
{
if(null === ($item = $this->doGet($id)))
{
if($this->successor !== null)
{
if(null !== ($item = $this->successor->get($id, false)))
{
if($pull)
{
if(!$softPull || !$this->isFull())
{
$this->set($item);
}
}
}
}
}
return $item;
}
/**
* Set the item on this cache.
* This operation causes objects to get marked as old,
* to be picked up by garbage collection.
*
* @param Backbone_Cache_Item $item
*/
public function set(Backbone_Cache_Item $item)
{
return $this->doSet($item);
}
/**
* Try to remove the item
*
* @param int $id
* @return bool
*/
public function remove($id)
{
if(!$this->doRemove($id))
{
return $this->successor->remove($id);
}
}
/**
* Transfer an item down the chain.
*
*/
public function bleed(Backbone_Cache_Item $item)
{
if($this->successor === null)
{
throw new Backbone_Cache_Exception('Nowhere to bleed too.');
}
$this->successor->set($item);
$this->doRemove($item->getId());
}
protected abstract function doGet($id);
protected abstract function doSet(Backbone_Cache_Item $item);
protected abstract function doRemove($id);
protected abstract function doRemove($id);
protected abstract function touch($id);
public abstract function flush();
public abstract function GC($limit);
public abstract function isFull();
}
You'll notice the 'bleed' method (catchy name, don't you think?). This method can be used by an implementation's garbage collector to transfer excess items to the next level cache, if present. I would've preferred to use a Template Method for garbage collection, and integrate the 'bleed' method into that, but the implementations of 'GC' are too different for this to be done sensibly. This mechanism may be a prime candidate for future refactoring.
This is a simple file based caching class (you did read the warning, right?):
<?php
class Backbone_Cache_SimpleFile extends Backbone_Cache_Abstract
{
private $dirName;
public function __construct($dir, $perms = 777)
{
$this->dirName = $dir;
if(!file_exists($this->dirName))
{
if(!mkdir($this->dirName, $perms))
{
throw new Backbone_Cache_Exception('Could not create cache dir');
}
}
}
protected function doGet($id)
{
$file = $this->getFileName($id);
if(file_exists($file))
{
return unserialize(file_get_contents($file));
}
}
protected function doSet(Backbone_Cache_Item $item)
{
file_put_contents($this->getFileName($item->getId()), $item->squash());
}
protected function doRemove($id)
{
unlink($this->getFileName($id));
}
private function getFileName($id)
{
return $this->dirName.DIRECTORY_SEPARATOR.$id;
}
public function isFull(){
return false;
}
/**
* Unlink all files and remove directory.
*
* @return bool
*/
public function flush(){
$this->GC(0);
}
public function touch($id)
{
$file = $this->getFileName($id);
if(file_exists($file))
{
touch($file);
return true;
}
return false;
}
/**
* Garbage collection. Removes expired items, as well as items aged beyond specified.
*
* @param $maxAge Maximum time since last access in seconds.
*/
public function GC($maxAge)
{
$dh = opendir($this->dirName);
while(($file = readdir($dh)) !== false)
{
if(!in_array($file, array('.', '..')))
{
if($maxAge === 0 || time() - fileatime($file) <= $maxAge)
{
unlink($this->dirName.DIRECTORY_SEPARATOR.$file);
}
else
{
$item = unserialize(file_get_contents($file));
if(!$item->isLive())
{
unlink($this->dirName.DIRECTORY_SEPARATOR.$file);
}
}
}
}
closedir($dh);
rmdir($this->dirName);
}
}
This is designed to be used as bottom level cache. Hence, 'isFull' always returns false, and the garbage collector does not use 'bleed'. To clean up disk space, just periodically use the garbage collector with a specified $maxAge to remove items, expired or not.
This is the meta object the Caching package uses to describe caching data:
<?php
class Backbone_Cache_Item {
/**
* Expiration UNIX timestamp.
*
* Defaults to epoch (0)
*
* @var int
*/
private $expires = 0;
/**
* The data to cache.
*
* @var mixed
*/
private $data;
/**
* Item identifier.
*
* @var scalar
*/
private $id;
/**
* Last time the data was accessed as UNIX timestamp.
*
* @var int
*/
private $accessTime;
/**
* Constructor.
*
* Sets id, data and expiration datetime.
*
* @param scalar $id
* @param mixed $data
* @param int $expires
*/
public function __construct($id, $data, $expires)
{
$this->id = $id;
$this->expires = $expires;
$this->data = $data;
$this->accessTime = time();
}
/**
* Checks current timestamp against expiration datetime.
*
* @return bool
*/
public function isLive()
{
return $this->expires >= time();
}
/**
* Return identifier
*
* @return scalar
*/
public function getId()
{
return $this->id;
}
/**
* Returns the internal data.
*
* @return mixed
*/
public function getData()
{
$this->accessTime = time();
return $this->data;
}
/**
* Returns the last access time of the internal data.
*
* @return int
*/
public function getAccessTime()
{
return $this->accessTime;
}
/**
* Returns a serialized representation of this object.
*
* @return string
*/
public function squash()
{
return serialize($this);
}
}



ShareThis