Hacker News new | past | comments | ask | show | jobs | submit login
Paul Buchheit: Make your site faster and cheaper to operate in one easy step (paulbuchheit.blogspot.com)
186 points by paul on April 17, 2009 | hide | past | favorite | 59 comments



For AJAX-based web applications, following is what I'd suggest in making things very zippy for the user:

1. Concatenate your JS and CSS files. Don't send out several files over the wire to the browser - the browser can only make 2 connections at a time. Be careful about JS dependencies - order is imp. in JS.

2. Minify and then compress the JS and CSS. Use Dojo's Shrinksafe or the YUI Compressor to do this. It will strip out whitespace, etc - make the code smaller in size (In JS, every byte counts) and compress.

3. Now gzip the above. (Paul's article talks only about gzipping - if you do the above 2 steps as well, you'd improve the performance a lot more).

Write an Ant script to automate all the above on code commit and you are done. Try other methods like loading other elements in the background or after a tab etc is clicked - important to show something to the user almost instantly. Did this for Alertle.com, which was a 100% AJAX web app (no page refresh at all), and the initial size of the code being sent to the browser went from 700k to about 20k using the steps above :)


4. Figure out browser caching - what to cache, how and for how long. Frequently changing code files - not a good idea to cache them for long, but images and other files, in most cases you only really need to download them once to the user's browser. Stuff to Google: etags and last-modified.

Improving site speed is a broad topic, and there would be other stuff on the server-side too where improvements can be made, like cached queries and using "prepared statements" to optimize the SQL.


> the browser can only make 2 connections at a time

This is slowly becoming not true anymore, decent browsers like Opera and Firefox have defaulted to 8 for a while now, and IE8 defaults to 6.

Although all your points are still valid.


While it's true that IE6 and IE7 can only handle 2 concurrent persistent connections per server, IE8 can handle 6.

Firefox 3 by default handles max 8 persistent connections per server, and max 15 connections per server in total (persistent and non-persistent).

This goes against RFC2616, but I guess the capacity of both servers and clients have increased enough the last 10 years to warrant such changes in default behavior across browsers.


The RFC2616 requirement was always a bad idea for users; for years I missed the Netscape feature that let you set this parameter to whatever you wanted; I left it at 20. (Was that up to 0.91N? I forget.) It helped out server software that made concurrent connections expensive, though.


Another good argument in favor of doing so is that establishing a connection has a non-zero time cost. Further, your server may not have the workers to spare (so that extra connection is going to sit until the server's queue isn't backed up.)


That's only one of the needed steps. Adjust your headers so that static items are not reloaded frequently. Use versioned urls (e.g., /scripts/main.js?234) and update the versions only when you change the scripts or CSS. YSlow goes a long way in helping with that kind of stuff.

The same applies for S3 uploads. You can pass cache headers in the upload request which will later be used on all downloads.


Do you know something that can modify django templates that contain CSS & JS imports so that the linked version gets updated with each checked in change to the files?


django-compress does this very well (integrates with YUI Compressor, and several other compression filters). I use it on all my django sites.


Gzipping is especially important for large JavaScript heavy apps. 280slides loads about 5x faster gzipped than not.

If you're really worried about the performance hit of gzipping, you can cache gzipped versions of static resources.


How do you deal with the time it takes to parse a large Javascript file? Past 500KB or so it can take several seconds.


Are you minifying your javascript?

500KB is pretty huge. Does every user need that? Perhaps you can use a smaller bootstrap script to pull down only what's needed when it's needed.


Sure -- it's basically half of YUI. Unminified it's 2.2MB. I've found that the time it takes to parse and load (not download) the javascript can be significant. I get away with showing content quickly and loading scripts in the background while the user is reading.


Check out YSlow and Steve Souders' tips if you're interested in speeding up your front end.

http://developer.yahoo.com/yslow/ http://developer.yahoo.com/performance/rules.html


Blog entries like this make me feel truly educated. I could visit my grandparents in Weston, MA and regurgitate, "the highest merit we ascribe to Moses, Plato, and Milton is, that they set at naught books and traditions" and my grandmother would reply, "Ah, someone who is educated!" Whereas if I had said, "GZip encode your hypertext," she would be the uneducated one.

So education is relative. Perhaps if I give in and move to the Bay Area my education will be richer than I would ever have imagined.


i don't think i understand your comment, but it was pretty entertaining, nonetheless.


It was casual. But truthful in sentiment. You can see it as ironical in that I was educated by the Internet and not by being in the Bay Area. Yet, being in the Bay Area, making the right friends, and appropriately asking the right questions and exchanging ideas may prove more useful than the Internet, sometimes. (After all, the Internet is still by your side if you need it.)


I'm about to skedaddle from the Bay area, and can't say it's really worth the expense of living here. You know when people start spouting off about the supposed virtues of living in an area they're probably in real estate; at the very least, they likely have financial interest in property. At least the Internet does one thing very well, and that is expose greed and stupidity. With such a high concentration of greed (high rents) in an area, it's not surprising to find such a high concentration of stupidity. There are at least 49 other states far more deserving of my tax dollars.


It's getting kind of lame being among people who want to divide their attention. I imagine being in the Bay Area I would find at least a few souls willing to spend Friday night programming hard core than carousing.


Honest question: are you on drugs, or not a native English speaker?


Coffee. I know English well enough to understand the term "ad hominem."


Which is Latin :-)


Which was the joke, right?


Heh, sorry, I didn't mean offense - I just found your train of thought a little opaque, but certainly in an interesting way!

Either way, I don't think knowing to GZip your pages counts as "education" any more than knowing how to put a horse shoe on a hoof or how to facilitate a corporate merger does.


He's (she's??) a little confused by moving to the Bay Area or not. He feel's he's educated by classical standards but still has technical knowledge. He has experience learning technical/IT knowledge from the Internet, and values such knowledge, but thinks by being in the bay area, that such knowledge can be accrued faster. He doesn't like the greed he finds in the Bay Area, and is basically trying to justify the loss of potential technical knowledge by leaving (or not even going) through weighing it up against his classical knowledge that his grandmother values more than he does.

Tip: stay out of the Bay Area but not too far, maybe to San Jose and keep in touch with a few buddies, but not to those who you truly don't like.


Mibbit uses a custom webserver... Instead of gzipping things on the fly, I decided to just look for a .gz version on the filesystem, and use that if it's there.

eg a request for 'index.html' looks for 'index.html' and 'index.html.gz'. If the gz is there it uses that and sets headers accordingly.

Works incredibly well, and the deploy scripts just gzip things when they're pushed out to production.


I don't know if this applies to your site or not, but if you have any dynamic http requests (non-cacheable) that approach doesn't work very well. You're much better off doing it on the fly with nginx as paul suggests; the impact on CPU is not noticeable even under very heavy load.


Yeah dynamic stuff is a whole different kettle of fish. The dynamic content on Mibbit is usually very small - maybe 100-200 bytes. The HTTP request headers are usually more (Which can't be compressed).

For Mibbit, I'd like to eventually do my own compression which will beat the socks off gzip, as it'll be session based rather than per request.

But yeah, I can see use cases where you're sending dynamic stuff which will benefit from gzip and where pre-caching on disk doesn't really make sense.


Out of curiosity what was the reason for using a custom webserver?


Scalability, full control over everything, it's not rocket science...

Mibbit uses some cool Comet like stuff, and I'm 99% confident the Mibbit webserver is better than anything else at doing this.


Wow, better than anything else? That's a pretty tall claim! What leads you to believe that?

I'm constantly amazed by the people who think writing HTTP servers is really hard.


It handles 2,000 HTTP requests a second on a single 1.4GB VPS server, which I think is good. It'll likely go to 10k/s or more... probably until the bandwidth is saturated.

Writing an HTTP server is easy. But making one that handles long lived connections (10s of thousands), and scales well, doesn't eat memory, doesn't eat CPU, etc etc is harder.

Having said that, it's easy enough that I think it's often worth doing if the webserver is integral to your success (It is with Mibbit).


That's pretty impressive! How much memory does it need per connection? I think that's the primary metric I'd use to assess Cometworthiness (once all the basic stuff is taken care of, e.g. delivering 1000 outbound events per second takes the same amount of CPU regardless of whether there is only one outbound connection or 100 000 of them).

My main point was that it would be hard to tell if someone somewhere had an HTTP implementation that could handle 250 000 concurrent HTTP connections per 64MB of RAM while yours only handled, say, 5000 per 64MB. (I suspect the former number is achievable; I know the latter is.)

I can confirm from experience that writing an HTTP server that scales well without eating memory or CPU is dramatically harder than just writing an HTTP server. However, it's very easy to do better than apache-mpm-prefork. (No need, though; lighttpd and nginx should both do better at that, I have heard that perlbal does too, and I suspect from experience that twisted.web does as well, although I haven't measured it.)


a2enmod deflate

/etc/init.d/apache2 restart


Thanks. For some reason, this critical bit of information was missing from the AppJet app, the article, and the comments. I had to spend some of my time actually looking for it.


I don't use apache, but apparently it may be a little more complicated than that if you really want it to work: http://www.nerdblog.com/2009/04/my-modgzip-settings-deflatec...



Wow, thanks! For fifteen minutes I've been scratching my head after I read all the docs on deflate. Nowhere was that command mentioned and when I saw your comment I had an "Of course!" moment. Thanks for that.


His speed estimates are wrong, as he should have tested gzipping many small files instead of gzipping one large file. There is a significant difference.


Why? If you're setting up your website right, you should be serving 1 HTML file + 1 JS file + 1 CSS file + 1 image for each request, and the latter 3 should all be cached indefinitely. (At least for static chrome - if you've got thumbnails, videos, or flash games you can't exactly sprite those.) His test file was 146K, which seems on the high side for HTML only, but I'd imagine serving 1 small file is faster than serving 1 large file.


1 image for each request? Are you putting all the images for each page into one image file, then cropping the needed region for each image on your page?



Have you actually made any measurements?

Smaller pages are actually faster (both individually and collectively), though gzip is already fast enough that the difference is irrelevant.


Nobody has mentioned the latency issue when gzipping. If you have to construct the whole file before gzipping, in situations where the file is large or dribbles out as the server processes the data, this could mean a significant slowdown. In virtually all situations, I agree gzipping is good, just like I always leave write-caching on my hard drive turned on so that the slowest part of my system can run at the fastest possible speed. There just might be consequences you do not intend. To address the obvious replies: yes, your server should not dribble out content. And, yes, if you are using a framework that spits out the entire page at once already, you will incur no additional latency on top of the gzip/gunzip time.


Incorrect. Gzip streams just fine, so there is no latency issue. Google search, for example, writes the top of the page before the search is complete. (and I assure you, they use gzip)


I'm sure I had this problem before, but maybe it was with an older server or browser? In any case, thanks for the correction.


Thanks for saying that - I was gonna pipe up but wasn't sure if it was public information. :-)


For large static files you can cache pre-gzipped versions and use content negotiation to serve it to clients that support gzip.


One scenario where using gzip might not be a good idea is when serving content less than 2-4 kB, like some thumbnails.


Gzipping images is generally not a good idea. PNG/GIF/JPEG are already highly compressed and will probably grow in size if you attempt to gzip them.


PNG is often not well compressed unless you've gone out of your way to do so with PNGCRUSH (http://pmt.sourceforge.net/pngcrush/) or something similar


I think this might be better? http://optipng.sourceforge.net/


Yeah, it's pretty good. pngout is good too. But I found some really good compression with pngnq. It's lossy, but can help you with some larger optimisations.

I had an image that pngout got to 218K (221K for optipng). pngnq chopped that thing to 74K. When flipping between the images you could see some pixels change, but only if you looked really hard.

(Running pngout on that image brought it down to 71K. Nice.)


Can Apache/nginx be configured to not gzip images or files smaller than a certain size?


When you serve thumbnails in an already compressed format like PNG, JPEG, GIF, then it should not make any sense to gzip them anyway.


true about the image part. images shouldn't be gzipped, bad examples. It still stands for files less than 2-4 kB.


Umm I am confused. Is this about compressing the web page itself? I think what takes most of the time to load on my website is the advertisers rather than the content itself. So perhaps this is about large websites like amazon which have large detabases?


I think what he means is just enabling transportation of compressed data from server to client, regardless what you are transporting. The size of your page does not affect the benefits you achieve with this method, which is always less data being transported.


GP is right about advertisers though :-(




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: