Hacker News new | past | comments | ask | show | jobs | submit login
How To Optimize Your Site With HTTP Caching (betterexplained.com)
85 points by zerop on Dec 9, 2011 | hide | past | favorite | 29 comments



This is a good writeup, though I'm surprised no-one has mentioned the holy grail of caching with HTTP. That of course is good old RFC 2616: http://www.ietf.org/rfc/rfc2616.txt

There's an entire section in there devoted just to caching in HTTP. Very well worth reading in its entirety.


http://tools.ietf.org/html/rfc2616 adds errata, RFCs that update it and hyperlinks (it is a web after all)

Direct link to Caching in HTTP http://tools.ietf.org/html/rfc2616#section-13


This is a great article for an introduction to HTTP caching. It's well-written and even covers how to set up caching in Apache.


Thanks for the kind words (I'm the author). My main, ever-evolving goal when writing tutorials is to try to write what I'd like to see:

1) Explain the underlying concept

2) Show variations

3) Explain how to do it yourself

4) Show how to verify you did it correctly

5) Meta: be as concise as possible, maximize bang for the buck


I've always enjoyed your writings kalid. I esp liked http://betterexplained.com/articles/a-visual-intuitive-guide.... Thanks so much for your site!


Thanks, I appreciate it!


I really enjoyed your writing style. It didn't push its (successful!) attempts at humor and used them sparingly enough to be a joy when they were read.


Thanks! It's easy to get self-conscious about putting humor into a tech article :)


And it links to another great article on gzip compression.


Be careful with ETags if you're serving content from a web-farm - http://developer.yahoo.com/performance/rules.html#etags


In the same spirit, there's a lot to gain and very little to lose if you activate ETags and serve content from just one source.


A pretty basic write up. Static caching is standard these days. The article doesn't help with speeding up today's dynamic sites.


Article from 2007


Yep, I was surprised too to see it on HN :).

I think one of the meta-takeaway is that understanding the fundamentals of web caching can help with your general CS knowledge ("There are only two hard problems in Computer Science: cache invalidation and naming things." -- Phil Karlton).

Looking at Apache, we see a few strategies:

  * Include last-modified metadata
  * Include content metadata (eTag/md5 of content)
  * Include explicit expiration date
  * Include a max-age
  * Include metadata about who can cache (public/private/no-caching, i.e. users can cache but proxies cannot)
These approaches could be used when designing data flows with Memcache, Redis, etc.


The best way I've seen for dealing with cache expiry, which the article does not talk about, is to use version numbers on assets. We found this to be especially important with javascript, css, etc -- if all of that stuff doesn't expire at the same time, it can hose the layout of your site.

Also there are may be many layers of caching between you and the user; not only HTTP caching in the browser, but you have to take into consideration any CDN's (Akamai, etc) and sometimes even caching reverse proxies in corporations.

At my previous job, we handled the versioning with deployment-time rewriting of the assets included in the base page to include the version number (As tagged by the build software with branch name + build number).

That said, enabling browser side caching was a huge win for page speed on the site.


One thing I don't understand. If the server has asked the client to cache an image for a year, and the image is indeed updated in that time, is there some way of telling the client to download that image anyway?

I'd take it to Google, but I have no idea how I'd ask that in Google query form.


This is actually referenced in the article. You can use the Last-Modified date and the server will either return a 304 (Not Modified) or the modified image if it is newer.


I read that, but if you say "this image won't change for exactly one year" and the client doesn't even request that resource from the server any more, how do you start that dialogue again?

pork has offered that you add a junk parameter to the end of a GET request and that should disrupt the cache, I'll need to read in to this. I'm interested in optimizing web speed as much as possible and this sort of thing and caching has always been something I've understood poorly.


Yep, that's the problem with long expiration dates -- the client may never check again (that's what we wanted, right?). The workaround is to request a new url which restarts the process.

Separately, the easiest way to get started with all these optimizations is to run the page speed check online:

https://developers.google.com/pagespeed/

and follow the recommendations, most important to least.


I've actually been playing around with this stuff all day, pretty much since my last comment above. I've enabled smarter caching on my website, replaced multiple image requests with a single spritesheet, optimized my images, and cleaned up my CSS file to remove unused code. Google's PageSpeed has been an invaluable tool, as well as webpagetest.org which breaks down the data in an intelligent way.

Turns out Google Analytics is actually doubling my page load time, but the data is too valuable to give up.

Anyway, thanks for the tips.


In HTTP, since it's stateless, you don't "tell" the client anything without it first asking. The usual way to bust the cache is to add a junk parameter to the end of a GET request.


I see. I assume you would change the html code to say <img src="image.png?cache=no"> or something like that to force the browser to redownload it? What if the html page itself were cached for a year? Is there an Apache setting that can give a global "no caching" command, or something like that?


Yep, exactly -- not only can the images be cached, but the HTML too!

The ideal way to do it is have the "loader file" (index.html) only cached with last-modified date, so as soon as it changes the client is aware. The client requests the file each time, and is returned the full file or a simple Not Modified response.

Within the file, you have references to permanently-cached, versioned resources (<img src="/images/foo.png?build=123" />). If the cache expiration is far enough away, the browser won't even issue the request to check for a new version.

Some browsers don't cache query params so you might use rewriting rules to change foo.png to foo.123.png. This rewriting is done automatically for you with Google Page Speed module for Apache.


It sounds like a much deeper subject than I first appreciated. I'll definitely read up on this. Using URL rewriting with caching is interesting, I've not seen that before.

I work with one lady that complains about a slow loading JQuery slideshow, and smarter caching may very well be the solution (at least, after the first load).


I've experimented and managed to shave 2 seconds off a client's website upon reloads - that's significant! Still playing with it, but I've already learned a lot.


Good information doesn't necessarily have an expiration date, but I concede to your point that it might not be the most up to date source out there.


2008+ version -- throw your assets into CloudFront (or another CDN of choice), let CDN handle the caching. ;)


I know you say that in jest, but I would say that it is still good to know the underlying technology. It is much easier to debug issues in the future once you understand them.


Concur - I think that, once you have a certain amount of experience built up working in frameworks like Rails, you can only go so far before you hit a wall. At that point, it's necessary to start learning the protocols and the lower level stuff to advance your understanding and your craft.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: