A reverse proxy that caches and deflates.

One of the projects we’ve been working on recently has a lot of dynamic data driven content that doesn’t change much. Knowing this we made sure the backend was setting cache control headers appropriately and then tried to turn caching on in apache using mod_disk_cache– so far so good. Pages were speedier; cpu load was lighter; we were happy. Then tritchey started insulting us.

I didn’t take it personally; I already knew these were good ideas and I saw this as an opportunity. Surely we could match his feat– I should only need to add one line to the config and enable a module. Ah the foolishness of youth.

I turned on mod_deflate, cleared the cache, and hit refresh. Hrm, nothing happens. Firebug[1] is reporting that the page is definately being delivered zipped, but the cache wasn’t ever being filled. It would successfully zip and cache the static javacript and css files. The html coming through the reverse proxy was never put in the disk cache and was requiring a re-request to the backend every time.

I saw some references to AddOutputFilterByType not working with reverse proxies from not being able to correctly identify the content type but that seemed to be fixed and deprecated at the same time. Suggestions were to use mod_filter instead but the docs on mod_deflate still pointed to the old style and there weren’t many good examples of this type of setup with mod_filter (good example coming below). Someone else claimed that apache was stripping off the cache control headers the backend needed, but my logging showed that wasn’t my problem. I’d largely given up and just left it at the caching without the deflate since that gave me better performance.

At the same time we were trying to bring up a new server for this to be hosted on using Ubuntu JEOS (8.04) rather than the Debian Etch we had been using. Once we got all deployed on the new box, I decided to give the caching another try. It works great!  The difference for us seems to primarly reside somewhere between the Apache 2.2.3 we had been using on Debian Etch, and the 2.2.8 with Hardy Heron. To me this somewhat justified our decision to give Ubuntu a shot over Debian. I’ve been a fan of Debian for a while but found quite a few cases where I hit a bug that has been fixed in a newer version that wasn’t in stable. Backports were sometimes available; more frequently it would only be available in testing which through libc6 would require everything to be upgraded at once.

Here’s an example of the config:

    RewriteEngine On # we are actually using mod_rewrite to implement it
    ProxyRequests Off # don't be a proxy, just allow the reverse proxy

    #see if a static file exists in the webroot first and serve it from there.
    RewriteCond /var/www/gainesville-green.com/current/www/%{REQUEST_FILENAME} -f
    RewriteRule ^(.+) /var/www/gainesville-green.com/current/www/$1 [L]

    #if not forward it to the lisp process listening locally
    RewriteRule ^/(.*)$ http://127.0.0.1:3434/$1 [P]

    #set up caching, enabled for the entire site.
    CacheRoot /var/cache/apache2/mod_disk_cache/gainesville-green.com
    CacheEnable disk /

    #Declare a filter named gzipping
    #The 2nd parameter is type of filter. I believe this is saying
    # that the filter operates on the content body, as opposed to
    # the url or some other part.
    FilterDeclare gzipping CONTENT_SET
    #in filter gzipping use deflate when content type equals text/html
    FilterProvider gzipping deflate Content-Type text/html
    FilterProvider gzipping deflate Content-Type text/css
    #'$' here is substring match, match both text/javascript application/x-javascript
    FilterProvider gzipping deflate Content-Type $javascript
    #insert the filter into the chain, by default at the end.
    FilterChain gzipping

[1] I’m using the beta which has got a lot of nice improvements to an already great extension. Also Clear Cache Button is a nice Firefox exension that aided in the testing here.

Comments are closed.