Wednesday, May 18, 2011

An Introduction to memcached

"Caching" is a term you've probably heard mentioned before in various places (including this site). The idea behind caching is to store a copy of some piece of data so you can re-use it again later without jumping through whatever hoops you had to go through the first time to get it. There are different ways you can cache data (queries, objects, etc) and different medium in which you can store the cache (files, database, memory). Any way you do it, the main goal of caching is to increase the performance of your site or application. In many cases caching is used to lessen the amount of interaction with the database, which increases performance and decreases the load on your server.

I would like to talk about my personal favorite method of caching: memcached. I'll show you how memcached works, how to install it, and how to use it to help your site/application run faster and scale better. According to the memcached site, "memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load." In plain English, this means memcached is an application that you can use to take advantage of spare free memory on any number of machines to cache pretty much anything you want (with a few exceptions) and retrieve it very quickly. Memcached was originally developed by Danga Interactive to help speed up LiveJournal. Some of memcached's great features are that in runs on a number of platforms (Linux, BSD, Windows), is VERY fast, and has a number of client APIs already written so you'll more than likely find libraries for any type of project you're working on. We'll focus on the PHP API in this article.


Before I get too far, I want to mention a couple of alternatives that may fit your particular situation.

Local Database Query Cache: Your database may have it's own native query caching, which you don't have to do much to use. The only drawback is that if a table is updated, its entire cache is thrown out.

The PHP APC extension: The APC extension is an opcode cache for your PHP scripts, but also provides a similar function to that of memcache. The biggest problem with APC is that you can only access the local APC cache. There are other distributed caching systems, such a MCache, but I have no personal experience with any of these, so I cannot opine on any advantages or disadvantages of using another tool.

Installation

First thing we need is an instance of memcached to store our data in. Unix/Linux folks can download the source from here and follow the instructions for installation on this page. Some distributions (like Ubuntu and CentOS) have memcached in their repositories, so you can use the native package installer like apt or yum to install memcache. Windows users can find binaries and installation instructions at http://jehiah.cz/projects/memcached-win32/. Any other questions you may have about memcached can possibly be answered on their excellent FAQ.

Now that we have memcached up and running, we need a way to talk to it. This is where the client APIs come in. We'll be using the PHP API, so I'll show you how to install the PHP memcached extenstion. The easiest way to do this is using the 'pecl' command: [pecl install memcache] this will download, compile, and install the extension. All you will need to do is add the line 'extension=memcache.so' to your php.ini file. You may also need to find and move the memcache.so file into the extensions/ directory (usually located at /usr/local/lib/php/extensions) by hand. Other options for installing PHP extensions and instructions for windows users can be found at http://us2.php.net/manual/en/install.pecl.php.

Implementation

Now that all the pieces are in place, let's integrate memcached into our application. First thing we need to do is to connect to our memcached server:

PLAIN TEXT
PHP:

$memcache = new Memcache;
$memcache->connect('localhost', 11211) or die ("Could not connect");
?>


This is assuming that memcached is running on the local machine and it's using the default settings. You would usually do this connection when you open a database connection at the beginning of your application. If you want to connect to more than one memcached server, simply call $memcache->connect() again and pass in the name and port number of the additional server(s). Now that we've got a connection, let's look at this section of code:

PLAIN TEXT
PHP:

$sql = "select * from pages where page_id=1";
$qry = mysql_query($sql) or die(mysql_error()." : $sql");
$result = mysql_fetch_object($qry);
$content = $result->content;
?>


This fetches the 'content' field from our pages table. Now, if the data in the content field does not change very often, it is a good candidate for caching. Here's one possible way how we would integrate memcached into our little section of code:

PLAIN TEXT
PHP:

//write query
$sql = "select * from pages where page_id=1";
//create an index key for memcache
$key = md5('query'.$sql);
//lookup value in memcache
$result = $memcache->get($key);
//check if we got something back
if($result == null) {
//fetch from database
$qry = mysql_query($sql) or die(mysql_error()." : $sql");
if(mysql_num_rows($qry)> 0) {
$result = mysql_fetch_object($qry);
//store in memcache
$memcache->set($key,$result,0,3600);
}
}
$content = $result->content;
?>


A bit more involved, but we are now using memcached! The above code first checks to see if we can find whatever it is we are looking for in memcache, and if we can't find it, we fetch it from the database and use the result to populate the cache. In this example, I stored the entire $result object in cache and set its expiration to 3600 seconds (1 hour). The third flag in the set() function deals with whether to compress the data or not. Depending on your needs, you can store strings, numbers, objects, and arrays in memcache. Anything that is serializable in PHP can be cached, so database connections and file handles won't work.

Now that we're pulling data from memcache, what happens if the data in the database is updated? We can compensate for this in two ways. The easiest is to pass an expiration on the data that is fairly low, but you'll have to deal with a little lag from the time you updated the database to when it will appear in the cache. The other way is to update the cache on the fly any time an update or delete occurs. This involves a bit more work as you may have to update many places in the cache depending on how many queries could possibly touch the data, but this is only necessary when doing query caching as in my example rather than just straight content caching.

Memcached affords us endless possibilities (query caching, content caching, session storage) and great flexibility. It's an excellent option for increasing performance and scalability on any website without requiring a lot of additional resources.

No comments: