Since we’ve been on slicehost, I’ve been forced to play the role of system administrator since we don’t have a real one. One problem I’ve run into is the long string of legacy applications that I have to support. Some of them I wrote, and some of them I inherited. For many reasons, they’re often organized and run in sub-optimal ways. Separating your static and dynamic content is a good habit to get into when you’re building scalable web applications. Static content is highly portable because it can live without context. You can serve it from anywhere and nobody knows the difference. When your site starts to get huge traffic, you can easily offload your static content to a CDN if you host it in an easy-to-separate way using an URL like static.domain.com or domain.com/static.
the problem
But my applications blur that line with static and dynamic content living together with sometimes no distinction. This is where nginx shines. Before I move on, I’ll mention that I do know that apache can do this exact same stuff, but it’s less turnkey to do it on Ubuntu because I’d need a second instance of apache to make it work and even then I think I can squeeze out better performance with this setup.
Let me first describe the setup I’m starting with. I have one 2048MiB slice from slicehost running the apache web server. Apache is configured with mpm_prefork because most of my applications are not thread safe. Among others, I run all my applications with mod_cgid, mod_fcgid, mod_passenger, mod_python. I have applications written in perl, php, ruby and python. This setup makes for somewhat large apache processes because all the modules plus many other typical modules are loaded for every running process (even if the process is only serving static content). In fact, the reason I started looking at optimization is that during heavy request times, our server was running out of memory and having to swap (a very bad thing for web servers).
the solution
After the transition, I ended up with all my static content being served by nginx. nginx is bound to ports 80 and 443 and handles all the incoming requests. nginx checks the docroot to see if the requested file exist on the filesystem. If it does, nginx serves them directly, if not it acts as a reverse proxy and forwards the request to apache which is now listening only on port 8080. Here’s the heart and soul of my setup:
location / {
proxy_set_header X-Real-IP $remote_addr;
if (-f $request_filename) {
break;
}
if (-f $request_filename/index.html) {
rewrite (.*) $1/index.html break;
}
if (!-f $request_filename) {
proxy_pass http://fresnobeehive.com:8080;
break;
}
}
After some initial testing, I’ve found that in many cases my new setup is actually a little slower than using apache to serve everything. The difference, however, is in scalability. Since nginx is only serving static content, its perfectly thread-safe and I can use worker processes to handle requests. I have 4 worker processes that can handle 1024 simultaneous connections. Each worker is generally using no more than a few MiB of memory which marks the most substantial difference over the apache-only setup. Previously, under normal load, my server would be using almost all of its 2048MiB of memory, and now I see humming along regularly with 600-700MiB free. Swapping is now rare and I’m even considering using some of that extra memory for running memcached which should blow my old setup out of the water performance-wise.