New multi-page HTTP compression proposal from Google

axod · on Sept 9, 2008

Support for something like this would be a step in the right direction, but I think there are a couple of simpler ways to improve HTTP:

A similar peeve of mine is HTTP headers.

If a browser opens a connection to a web server, and the connection is keep-alive, the browser will send several requests down than one connection.

But for every single request, it'll send out it's full headers. That's really wasteful and idiotic. Send full headers when the connection is opened, there is no need to repeat every single time.

Also if the connection is keep-alive, it'd be reasonably simple to have gzip compression over the full data - not per request. This would achieve the same as the google proposal, but in a better way IMHO.

The HTTP headers can add up quite a bit if you're using XMLHttpRequest or similar. Also if the data is small, compression isn't worthwhile. HTTP header spam is a PITA.

So if I had my way:

* Headers only sent once at the start of a connection, not per request. Send them if they change - eg a new cookie has been set since the last request :/

* A new transfer-type to specify that the data is gzipped as one - instead of gzipped per request.

Those 2 simple changes to HTTP would make things so much better.

tlrobinson · on Sept 10, 2008

I don't know, these seem like optimizations in the wrong places.

1. Headers aren't always the same for every request. Etags and other cache headers, content-types, etc can/will all be different. They're also comparatively small to most content. And of course you can control which headers you send, so if it's really a problem just send the bare minimum.

edit: I'm talking about the response headers, reading that again it looks like you're talking about the request. In that case you're probably right, there's a lot of duplication, especially user agent and cookies.

2. You don't know every resource you're going to request up front: you download whateverpage.html, decode and parse it, then make all the requests for scripts, images, stylesheets etc. I don't see this working very well.

coderrr · on Sept 10, 2008

Those are interesting changes.

It's true most headers don't change. One I can think of that usually changes between resource types is Accept. Usually it will be slightly different between <img> <script> <link> and <iframe>, but this probably wouldn't make much of a difference if you allow only to send changed headers. I'd be curious to see how much bandwidth you save with this. You also might want to allow for header removal as you do header change. I can't think of a scenario where not removing a header would cause a problem, but there could potentially be one.

For the gzip as a whole instead of per request, there's one reason I can't see many browsers taking advantage of that. Most browsers will make requests like, write request, read response, write request, read response. Instead of write, write, write, read, read, read. So I'm not sure how you could unzip everything together unless you wait to display the items till the entire connection is finished. Also, this would require the client to give an indication when it is done writing requests to the stream, so that all the data can be fetched from the server and then zipped together. Which would require a much bigger change to the protocol.

Is there anything I'm missing?

axod · on Sept 10, 2008

The main gain with headers would be for comet like applications. In Mibbit/Meebo etc type applications you're sending a lot of small messages, interspersed with HTTP header spam. Often the data is smaller than the HTTP headers.

For gzip, I don't see an issue. The only change that would be needed would be for the gzip state to be saved between requests. For the browser, it would request object A, get the response, unzip it, display. Then it would request object B, unzip it using the previous gzip state, etc.

For the sender, likewise. So there would be no change in terms of timing. The only change would be that the gzip state would be carried over to the next request. (It's possible I'm remembering wrong and gzip can't do this - if so a different compression method that can be compressed/decompressed individually, but using a running shared dictionary/state would be needed).

tlrobinson · on Sept 10, 2008

Comet optimizes for latency, with the big improvement of avoiding the latency of opening a TCP connection and sending the request.

With both long polling and streaming you could probably send the headers long before the actual data is ready to be sent as well.

ardit33 · on Sept 10, 2008

I read the whole thing, and I just don't like it. The beauty of HTTP headers, cookies, and elements is their simplicity (or primitivness). They are easy to implement.

This proposal will introduce a huge complexity to the HTTP spec. If you have implemented caching in a client, it is so easy for things to go wrong, even if the clients are right, the server, content managers could mess this up roaly really fast.

The other thing I don't like, is that when using raw sockets, and try to implement HTTP over it, (many reasons to do this, especially in mobile), now you have to deal with more complexities.

As somebody mentioned above, eliminating duplicate http headers, and addressing the duplicity issue in the markup language itself (i.e HTML5 or XHTML2), and not the transport protocol.

ardit33 · on Sept 10, 2008

here is somebody's counterpoint:

"It seems to me that AJAX can be used to solve this problem in a simpler manner. Take Gmail for example--it downloads the whole UI once and then uses AJAX to get the state-specific data. The example from the PPT showed a 40% reduction in the number of bytes transmitted when using SDCH (beyond what GZIP provided) for google SERPs. I bet you could do about that well just by AJAXifying the SERPs (making them more like GMail) + using regular HTTP cache controls + using a compact, application-specific data format for the dynamic parts of the page + GZIP. Maybe Google's AJAX Search API already does that? In fact, you might not even need AJAX for this; maybe IFRAMEs are enough.

I also noticed that this proposal makes the request and response HTTP headers larger in an effort to make entity bodies smaller. It seems over time there is an trend of increasingly large HTTP headers as applications stuff more and more metadata into them, where it is not all that unusual for a GET request to require more than one packet now, especially when longish URI-encoded IRIs are used in the message header. Firefox cut down on the request headers it sends [2] specifically to increase the chances that GET requests are small enough to fit in one packet. Since HTTP headers are naturally highly repetitive (especially for resources from the same server), a mechanism that could compress them would be ideal. Perhaps this could be recast as transport-level compression so that it could be deployed as a TLS/IPV6/IPSEC compression scheme.

Regards, Brian "

litewulf · on Sept 10, 2008

I assume the main argument against this idea is the burden it places on the Javascript engine. Its the same reason people use gzip and not packer (well, assuming packer produces a smaller file, which happens sometimes).

Engines are getting faster, but they still really can't compete with native browser facilities.

jwilliams · on Sept 9, 2008

I haven't read the detail of the specification, but is a great idea.

The amount of similarity between pages of Markup (esp XML) or related pieces of JavaScript could be significant.

I found this Google PowerPoint that hints at some of the benefits http://209.85.141.104/search?q=cache:RIkP-5qZ4awJ:assets.en....

The PPT claims About 40 percent data reduction better than Gzip alone on Google search.

dmv · on Sept 9, 2008

Link (of a link) to the PDF: http://sdch.googlegroups.com/web/Shared_Dictionary_Compressi...

andrewf · on Sept 10, 2008

Can't be a coincidence that they started pushing this a week after Chrome arrived. I wonder what other proposals Google has coming?

bprater · on Sept 10, 2008

Curious as to how this compares to standard GZIP compression over the course of a hundred pages on a website.

andreyf · on Sept 10, 2008

Another post cites 40%.