How To Optimize Your Site With HTTP Caching

bitops · on Dec 10, 2011

This is a good writeup, though I'm surprised no-one has mentioned the holy grail of caching with HTTP. That of course is good old RFC 2616: http://www.ietf.org/rfc/rfc2616.txt

There's an entire section in there devoted just to caching in HTTP. Very well worth reading in its entirety.

d0mine · on Dec 10, 2011

http://tools.ietf.org/html/rfc2616 adds errata, RFCs that update it and hyperlinks (it is a web after all)

Direct link to Caching in HTTP http://tools.ietf.org/html/rfc2616#section-13

ceol · on Dec 9, 2011

This is a great article for an introduction to HTTP caching. It's well-written and even covers how to set up caching in Apache.

kalid · on Dec 9, 2011

Thanks for the kind words (I'm the author). My main, ever-evolving goal when writing tutorials is to try to write what I'd like to see:

1) Explain the underlying concept

2) Show variations

3) Explain how to do it yourself

4) Show how to verify you did it correctly

5) Meta: be as concise as possible, maximize bang for the buck

raghus · on Dec 10, 2011

I've always enjoyed your writings kalid. I esp liked http://betterexplained.com/articles/a-visual-intuitive-guide.... Thanks so much for your site!

kalid · on Dec 10, 2011

Thanks, I appreciate it!

ceol · on Dec 9, 2011

I really enjoyed your writing style. It didn't push its (successful!) attempts at humor and used them sparingly enough to be a joy when they were read.

kalid · on Dec 10, 2011

Thanks! It's easy to get self-conscious about putting humor into a tech article :)

fexl · on Dec 10, 2011

And it links to another great article on gzip compression.

js2 · on Dec 10, 2011

Be careful with ETags if you're serving content from a web-farm - http://developer.yahoo.com/performance/rules.html#etags

riledhel · on Dec 10, 2011

In the same spirit, there's a lot to gain and very little to lose if you activate ETags and serve content from just one source.

js4all · on Dec 10, 2011

A pretty basic write up. Static caching is standard these days. The article doesn't help with speeding up today's dynamic sites.

ehc · on Dec 9, 2011

Article from 2007

kalid · on Dec 9, 2011

Yep, I was surprised too to see it on HN :).

I think one of the meta-takeaway is that understanding the fundamentals of web caching can help with your general CS knowledge ("There are only two hard problems in Computer Science: cache invalidation and naming things." -- Phil Karlton).

Looking at Apache, we see a few strategies:

  * Include last-modified metadata
  * Include content metadata (eTag/md5 of content)
  * Include explicit expiration date
  * Include a max-age
  * Include metadata about who can cache (public/private/no-caching, i.e. users can cache but proxies cannot)

These approaches could be used when designing data flows with Memcache, Redis, etc.

z2amiller · on Dec 9, 2011

The best way I've seen for dealing with cache expiry, which the article does not talk about, is to use version numbers on assets. We found this to be especially important with javascript, css, etc -- if all of that stuff doesn't expire at the same time, it can hose the layout of your site.

Also there are may be many layers of caching between you and the user; not only HTTP caching in the browser, but you have to take into consideration any CDN's (Akamai, etc) and sometimes even caching reverse proxies in corporations.

At my previous job, we handled the versioning with deployment-time rewriting of the assets included in the base page to include the version number (As tagged by the build software with branch name + build number).

That said, enabling browser side caching was a huge win for page speed on the site.

SquareWheel · on Dec 9, 2011

One thing I don't understand. If the server has asked the client to cache an image for a year, and the image is indeed updated in that time, is there some way of telling the client to download that image anyway?

I'd take it to Google, but I have no idea how I'd ask that in Google query form.

mkchandler · on Dec 9, 2011

This is actually referenced in the article. You can use the Last-Modified date and the server will either return a 304 (Not Modified) or the modified image if it is newer.

SquareWheel · on Dec 10, 2011

I read that, but if you say "this image won't change for exactly one year" and the client doesn't even request that resource from the server any more, how do you start that dialogue again?

pork has offered that you add a junk parameter to the end of a GET request and that should disrupt the cache, I'll need to read in to this. I'm interested in optimizing web speed as much as possible and this sort of thing and caching has always been something I've understood poorly.

kalid · on Dec 10, 2011

Yep, that's the problem with long expiration dates -- the client may never check again (that's what we wanted, right?). The workaround is to request a new url which restarts the process.

Separately, the easiest way to get started with all these optimizations is to run the page speed check online:

https://developers.google.com/pagespeed/

and follow the recommendations, most important to least.

SquareWheel · on Dec 11, 2011

I've actually been playing around with this stuff all day, pretty much since my last comment above. I've enabled smarter caching on my website, replaced multiple image requests with a single spritesheet, optimized my images, and cleaned up my CSS file to remove unused code. Google's PageSpeed has been an invaluable tool, as well as webpagetest.org which breaks down the data in an intelligent way.

Turns out Google Analytics is actually doubling my page load time, but the data is too valuable to give up.

Anyway, thanks for the tips.

pork · on Dec 9, 2011

In HTTP, since it's stateless, you don't "tell" the client anything without it first asking. The usual way to bust the cache is to add a junk parameter to the end of a GET request.

SquareWheel · on Dec 10, 2011

I see. I assume you would change the html code to say <img src="image.png?cache=no"> or something like that to force the browser to redownload it? What if the html page itself were cached for a year? Is there an Apache setting that can give a global "no caching" command, or something like that?

kalid · on Dec 10, 2011

Yep, exactly -- not only can the images be cached, but the HTML too!

The ideal way to do it is have the "loader file" (index.html) only cached with last-modified date, so as soon as it changes the client is aware. The client requests the file each time, and is returned the full file or a simple Not Modified response.

Within the file, you have references to permanently-cached, versioned resources (<img src="/images/foo.png?build=123" />). If the cache expiration is far enough away, the browser won't even issue the request to check for a new version.

Some browsers don't cache query params so you might use rewriting rules to change foo.png to foo.123.png. This rewriting is done automatically for you with Google Page Speed module for Apache.

SquareWheel · on Dec 10, 2011

It sounds like a much deeper subject than I first appreciated. I'll definitely read up on this. Using URL rewriting with caching is interesting, I've not seen that before.

I work with one lady that complains about a slow loading JQuery slideshow, and smarter caching may very well be the solution (at least, after the first load).

SquareWheel · on Dec 10, 2011

I've experimented and managed to shave 2 seconds off a client's website upon reloads - that's significant! Still playing with it, but I've already learned a lot.

bittermang · on Dec 9, 2011

Good information doesn't necessarily have an expiration date, but I concede to your point that it might not be the most up to date source out there.

oniTony · on Dec 9, 2011

2008+ version -- throw your assets into CloudFront (or another CDN of choice), let CDN handle the caching. ;)

mkchandler · on Dec 9, 2011

I know you say that in jest, but I would say that it is still good to know the underlying technology. It is much easier to debug issues in the future once you understand them.

bitops · on Dec 10, 2011

Concur - I think that, once you have a certain amount of experience built up working in frameworks like Rails, you can only go so far before you hit a wall. At that point, it's necessary to start learning the protocols and the lower level stuff to advance your understanding and your craft.