Wednesday, May 18, 2011
An Introduction to memcached
Using Nginx as a load balancer
Here’s a look at how nginx does basic load balancing :
upstream yoursite { server yoursite1.yoursite.com; server yoursite2.yoursite.com; } server { server_name www.yoursite.com; location / { proxy_pass http://yoursite; } } This configuration will send 50% of the requests for www.yoursite.com to yoursite1.yoursite.com and the other 50% to yoursite2.yoursite.com.
ip_hash
You can specify the ip_hash directive that guarantees the client request will always be transferred to the same server.
If this server is considered inoperative, then the request of this client will be transferred to another server.
upstream yoursite { ip_hash; server yoursite1.yoursite.com; server yoursite2.yoursite.com; } down
If one of the servers must be removed for some time, you must mark that server as down.
upstream yoursite { ip_hash; server yoursite1.yoursite.com down; server yoursite2.yoursite.com;} weight
When there’s no weight set, the weight is equal to one.
upstream yoursite { server yoursite1.yoursite.com weight=4; server yoursite2.yoursite.com; } This configuration will send 80% of the requests to yoursite1.yoursite.com and the other 20% to yoursite2.yoursite.com.
note: It’s not possible to combine ip_hash and weight directives.
max_fails and fail_timeout
max_fails is a directive defining the number of unsuccessful attempts in the time period defined by fail_timeout before the server is considered inoperative. If not set, the number of attempts is one. A value of 0 turns off this check.
If fail_timeout is not set the time is 10 seconds.
upstream yoursite { server yoursite1.yoursite.com; server yoursite2.yoursite.com max_fails=3 fail_timeout=30s; } In this configuration nginx will consider yoursite2.yoursite.com as inoperative if a request fails 3 times with a 30s timeout.
backup
If the non-backup servers are all down or busy, the server(s) with the backupdirective will be used.
upstream yoursite { server yoursite1.yoursite.com max_fails=3; server yoursite2.yoursite.com max_fails=3; server yoursite3.yoursite.com backup; } This configuration will send 50% of the requests for www.yoursite.com to yoursite1.yoursite.com and the other 50% to yoursite2.yoursite.com.
If yoursite1.yoursite.com and yoursite2.yoursite.com both fails 3 times the requests will be send to yoursite3.yoursite.com.
10 baby steps to install Memcached Server and access it with PHP
Troubleshooting Response Time Problems – Why You Cannot Trust Your System Metrics
Production Monitoring is about ensuring the stability and health of our system, that also includes the application. A lot of times we encounter production systems that concentrate on System Monitoring, under the assumption that a stable system leads to stable and healthy applications. So let’s see what System Monitoring can tell us about our Application.
Let’s take a very simple two tier Web Application:
This is a simple multi-tier eCommerce solution. Users are concerned about bad performance when they do a search. Let's see what we can find out about it if performance is not satisfactory. We start by looking at a couple of simple metrics.
CPU Utilization
The best known operating system metric is CPU utilization, but it is also the most misunderstood. This metric tells us how much time the CPU spent executing code in the last interval and how much more it could execute theoretically. Like all other utilization measures it tells us something about the capacity, but not about health, stability or even performance. Simply put: 99% CPU utilization can either be optimal or indicate impeding disaster depending on the application.
The CPU charts show no shortage on either tier
Let's look at our setup. We see that the CPU utilization is well below 100%, so we do have capacity left. But does that mean the machine or the application can be considered healthy? Let’s look at another measure that is better suited for the job, the Load Average (System\Processor QueueLength on Windows ). The Load Average tells us how many threads or processes are currently executed or waiting to get CPU time.
Unix Top Output: load average: 1.31, 1.13, 1.10
Linux systems display three sliding load averages for the last one, five and 15 minutes. The output above shows that in the last minute there were on average 1.3 processes that needed a CPU core at the same time.
If the Load Average is higher than the number of cores in the system we should either see near 100% CPU utilization, or the system has to wait for other resources and cannot max out the CPU. Examples would be Swapping or other I/O related tasks. So the Load Average tells us if we should trust the CPU usage on the one hand and if the machine is overloaded on the other. It does not tell us how well the application itself is performing, but whether the shortage of CPU might impact it negatively. If we do notice a problem we can identify the application that is causing the issue, but not why it is causing it.
In our case we see that neither the load average nor the CPU usage shines any light on our performance issue. If it were to show high CPU utilization or a high load average we could assume that the shortage in CPU is a problem, but we could not be certain.
Memory Usage
Used memory is monitored because the lack of memory will lead to system instability. An important fact to note is that Unix and Linux operating systems will most always show close to 100% memory utilization over time. They fill the memory up with buffers and caches which get discarded, as opposed to swapped out, if that memory is needed otherwise. In order to get the "real" memory usage we need subtract these. In Linux we can do by using the free command.
Memory Usage on the two systems, neither is suffering memory problems
If we do not have enough memory we can try to identify which application consumes the most by looking at the resident memory usage of a process. Once identified we will have to use other means to identify why the process uses up the memory and whether this is ok. When we look towards memory regarding Java/.NET performance we have to make sure that the application itself is never swapped out. This is especially important because Java accesses all its memory in a random-access fashion and if a portion were to be swapped out it would have serve performance penalties. We can monitor this via swapping measures on the process itself. So what we can learn here is whether the shortage of memory has a negative impact on application performance. As this is not the case, we are tempted to ignore memory as the issue.
We could look at other measures like network or disk, but in all cases the same thing would be true, the shortage of a resource might have impact, but we cannot say for sure. And if we don't find a shortage it does not necessarily mean that everything is fine.
Database
An especially good example of this problem is the database. Very often the database is considered the source of all performance problems, at least by the application people. From a DBA's and operations point of view the database is often running fine though. Their reasoning is simple enough, the database is not running out of any resources, there are no especially long running or CPU consuming statements or processes running and most statements execute quite fast. So the database can not be the problem.
Let's look at this from an application point of view
Looking At The Application
As users are reporting performance problems the first thing that we do is to look at the response time and its distribution within our system.
The overall distribution in our system does not show any particular bottleneck
At first glance we don't see anything particularly interesting when looking at the whole system. As users are complaining about specific requests lets go ahead and look at these in particular:
The response time distribution of the specific request shows a bottleneck in the backend and a lot of database calls for each and every search request
We see that the majority of the response time lies in the backend and the database layer. That the database contributes a major portion to the response time does not mean however that the DBA was wrong. We see that every single search executes 416 statements on average! That means that every statement is executing in under one millisecond and this is fast enough from the database point of view. The problem really lies within the application and its usage of the database. Let's look at the backend next.
The Heap Usage and GC activity chart shows a lot of GC runs, but does it have negative impact?
Looking at the JVM we immediately see that it does execute a lot of garbage collection (the red spikes), as you would probably see in every monitoring tool. Although this gives us a strong suspicion, we do not know how this is affecting our users. So let's look at that impact:
These are the runtime suspensions that directly impact the search. It is considerable but still amounts to only 10% of the response time
A single transaction is hit by garbage collection several times and if we do the math we find out that garbage collection contributes 10% to the response time. While that is considerable it would not have made sense to spend a lot of time on tuning it just now. Even if we get it down to half it would only have saved us 5% of the response time. So while monitoring garbage collection is important, we should always analyze the impact before we jump to conclusions.
So let's take a deeper look at where that particular transaction is spending time on the backend. To do this we need to have application centric monitoring in place which we can then use to isolate the root cause.
The detailed response time distribution of the search within the backend shows two main problems: too many EJB calls and a very slow doPost method
With the right measure points within our application we immediately see the root causes of the response time problem. At first we see that the WebService call done by the search takes up a large portion of the response time. It is also the largest CPU hotspot within that call. So while the host is not suffering CPU problems, we are in fact consuming a lot of it in that particular transaction. Secondly we see that an awful lot of EJB calls are done which in turn leads to the many database calls that we have already noticed.
That means we have identified a small memory-related issue; although there are no memory problems noticeable if we were to look only at system monitoring. We also found that we have a CPU hotspot, but the machine itself does not have a CPU problem. And finally we found that the biggest issue is squarely within the application; too many database and EJB calls, which we cannot see on a system monitoring level at all.
Conclusion
System metrics do a very good job at describing the environment, after all that is what they are meant for. If the environment itself has resource shortages we can almost assume that this has a negative impact on the applications, but we cannot be sure. If there is no obvious shortage this does not, however, imply that the application is running smoothly. A healthy and stable environment does not guarantee a healthy, stable and performing application.
Similar to the system, the application needs to be monitored in detail and with application-specific metrics in order to ensure its health and stability. There is no universal rule as to what these metrics are, but they should enable us to describe the health, stability and performance of the application itself.
Nginx and Memcached, a 400% boost!
If web architectures, performance, or scalability are topics you would like to keep on top of (who doesn't!), then chances are, you've heard of Nginx("engine x"). Originally developed by Igor Sysoev for rambler.ru (second largest Russian web-site), it is a high-performance HTTP server / reverse proxy known for its stability, performance, and ease of use. The great track record, a lot of great modules, and an active development community have rightfully earned it a steady uptick of users, and most recently, a notable mention in the Netcraft report.
Memcached module - an easy 4x speed multiplier
Memcached, the darling of every web-developer, is capable of turning almost any application into a speed-demon. Benchmarking one of my own Rails applications resulted in ~850 req/s on commodity, non-optimized hardware - more than enough in the case of this application. However, what if we took Mongrel out of the equation? Nginx, by default, comes prepackaged with the Memcached module, which allows us to bypass the Mongrel servers and talk to Memcached directly. Same hardware, and a quick test later: ~3,550 req/s, or almost a 400% improvement! Not bad for a five minute tweak!
Think smart, forget cache invalidations
The only snag in our scheme for easy performance gains comes with the fact that more often than not, our application servers contain additional caching policies (read invalidations / authentication), and MIME type logic. The former, as recently documented by Tobias Lütke and Geoffrey Grosenbach, if properly thought through can be solved with some clever URL rewriting policies and automatic TTL timeouts. When implemented correctly, we could simply set the memcached key to be the full request URL, allowing us to completely bypass our app. servers.
MIME-type logic
MIME type magic can be as easy as complex as we wish. If you only serve one content type ('text/html', for example), the solution is simple:
location /dynamic_request { # Set default type to text/html default_type text/html; # ... }
Dynamic argument types, just for fun
However, if we want to serve multiple content-types, or perhaps even parameterize the request type in a query string, we've got some extra work to do. Not unlike any other HTTP server, Nginx checks the filetype extension at the end of every request path to determine the correct content-type header, a solution which unfortunately breaks down in majority of modern, URL friendly web-applications:
1. GET /dynamic_request.js - Content-Type = text/javascript
2. GET /dynamic_request - Content-Type = ?
3. GET /dynamic_request?format=js - Content-Type = ?
Case 1 is easily solved by Nginx directly. Case 2 is tricky, but can be solved via a 'default_type' line in the config as document above. And case 3 will require some additional logic - namely, we can hardcode a rule to rewrite our dynamic query string parameters to automagically add an extension to the path of each incoming request:
location /dynamic_request { # append an extenstion for proper MIME type detection if ($args ~* format=json) { rewrite ^/dynamic_request/?(.*)$ /dynamic_request.js$1 break; } if ($args ~* format=xml) { rewrite ^/dynamic_request/?(.*)$ /dynamic_request.xml$1 break; } memcached_pass 127.0.0.1:11211; error_page 404 = @dynamic_request; }
That should do the trick! Cache invalidations are handled, MIME types are served correctly, and our app. servers are bypassed in 95%+ of the cases. Instead, Nginx talks directly to Memcached and only proxies the cache misses - an easy 400% performance boost!