Real-world HTTP/2: 400GB of images per day

rgbrenner · on July 15, 2016

Founder of NuevoCloud here. If I read this right, you guys used Cloudflare for http 2. So let me ask you this, when you did your comparison, were all of the images cached (ie: x-cache: hit) at the edge?

The reason I ask is because cloudflare, last I checked, still hasn't implemented http2's client portion. So when a file is not cached, it does this:

client <--http2--> edge node <--http 1.1--> origin server.

Http2 is only used for the short hop between the client and edge node.. then the edge node uses http 1.1 for the connection to the origin server, which may be thousands of miles away.

In other words, in your test, depending on the client location and the origin server location.. your test may have used http 1.1 for the majority of the distance.

If you guys want to rerun this test on our network, we use http2 everywhere... your test would look like this on our network:

client <--http2--> edge node (closest to client) <--http2--> edge node (closest to server) <--http2--> origin server.

So even if your origin server doesn't support http2, it'll only use http 1.1 over the short hop between your server and the closest edge node.

You're welcome to email me if you want to discuss details you don't want to post here.

Edit: I should also mention, that we use multiple http 2 connections between our edge nodes and between the edgenode and origin server... removing that bottleneck. So only the client <--> edge node is a single http 2 connection.

xzyfer · on July 15, 2016

To the best of my knowledge you are correct about how CloudFlare works. For context this data was collected over the period about a month on real production pages with significant traffic.

The edges were well and truely primed.

rgbrenner · on July 15, 2016

although cloudflare doesn't manage caches on a per account basis. Each PoP has a single LRU cache that's used for all customers. In other words, even if you've primed it, your files may have been pushed out of the cache by a larger customer.

In order to know this hasn't occurred, you really have to check the hit rate cloudflare is reporting (for static files that rarely change, this should be near or at 100%)... and when you're doing side-by-side comparisons (like the speed index), you have to actually check the x-cache headers to verify that a cache miss hasn't occurred. Otherwise, you wouldn't actually know that a significant portion of traffic isn't being sent over http 1.1 (because of cache misses).

brobinson · on July 15, 2016

>Each PoP has a single LRU cache that's used for all customers.

Is this true for all tiers of paid accounts? Can someone from CF chime in here?

mgw · on July 15, 2016

As recently mentioned by a CloudFlare employee in this post (https://news.ycombinator.com/item?id=11439582):

> We cache as much as possible, for as long as possible. The more requested a file, the more likely it is to be in the cache even if you're on the Free plan. Lots of logic is applied to this, more than could fit in this reply. But importantly; there's no difference in how much you can cache between the plans. Wherever it is possible, we make the Free plan have as much capability as the other plans.

This does not confirm the exact statement but at least points in this direction.

angry-hacker · on July 15, 2016

So does it matter for Cloudflare or affects my site's performance if I use http2 in the backend while behind Cloudflare?

rgbrenner · on July 15, 2016

you're not using http2 then, since it would never upgrade the connection to http2. So the connections to your backend are all http 1.1.

imaginenore · on July 15, 2016

But origin hits are relatively rare, so it doesn't matter that much that it's not http2.

dpc_pw · on July 15, 2016

I did not do any real tests and I might be completely wrong etc. but it seems to me that http2 is going to perform poorly over wireless links like 3g.

With http1 one had N tcp connections, and with the way tcp slowly increases the bandwidth used, and rapidly decreases it when packet is lost, even if any packet were dropped (which will happen quite a lot on 3g) other tcp streams were not delayed, or blocked, and can even utilize the leftover bandwidth, yielded by the stream that lost the packet.

With http2 however there's one tcp connection, so dropped packets will cause under-utilization of the bandwidth. On top of that dropped packets will cause all frames after them, to be delayed in kernel receiving buffer until the dropped packet is retransmitted, while in http1 case they would be available at the app level right away.

HTTP2 being implemented on top of TCP always seemed like a weird choice. It should have been UDP, IMO. That's why network accelerators like PacketZoom make so much sense. Note: I work in PacketZoom, I did not do any in-depth research on HTTP2, and this is my opinion, not necessarily of the company.

Lukasa · on July 15, 2016

This is a real worry, but as with all things the actual behaviour of H2 on lossy networks is more complex than that.

TCP's congestion control algorithms don't work that well when you have many TCP streams competing for the same bandwidth. This is because while packet loss is a property of the link, not an individual TCP stream, each packet loss event necessarily only affects one TCP stream. This means the others don't get the true feedback about the lossiness of the connection. This behaviour can lead to a situation where all of your TCP streams try to over-ramp.

A single stream generally behaves better on such a link: it's getting a much more complete picture of the world.

However, your HOL blocking concern is real. This is why QUIC is being worked on. In QUIC, each HTTP request/response is streamed independently over UDP, which gets the behaviour you're talking about here, while also maintaining an overall view of packet loss for rate limiting purposes.

bsdetector · on July 15, 2016

> can lead to a situation where all of your TCP streams try to over-ramp. ... A single stream generally behaves better on such a link: it's getting a much more complete picture of the world

Multiple HTTP connections work better for exactly this reason, because they are 'stealing' bandwidth from streaming video by 'not playing fair'. For example, 6 connections ramping back up bandwidth at 6x the rate of a single connection or sometimes only scaling back on 1/6th of the streams at once.

...which is fine because multiple parallel HTTP connections is usually a browser doing so for short-lived data transfers for active users and it is not the bulk of the data on the network.

dpc_pw · on July 15, 2016

Agreed about possibility of over-ramp on good links. But on wireless packets drops can be quite frequent and unrelated to reaching bandwidth limit.

king_phil · on July 15, 2016

I worry about TCP window scaling (and full TCP windows) when only using one TCP connection. There is a good reason download managers use multiple connections to download one file, because depending on the latency the maximum transfer rate is capped because only so many TCP packets can be in flight simultaneously. I wonder if nobody ever thought about that... HTTP1/x solved that (more by chance) with multiple connections...

bsdetector · on July 15, 2016

Has anybody ever used this for denial of service? Make a large request so the TCP window scales up then just stop sending ACKs. All that data has to remain in the server's memory for retransmit. Even with a really short timeout before connections are dropped you could probably tie up a lot of server memory.

amock · on July 15, 2016

Have you looked at QUIC? It seems like it addresses the problems with HTTP2 over TCP as well as providing some additional benefits like like speeding up the secure connection establishment.

dpc_pw · on July 15, 2016

Yes, I'm aware of it, though not as in-depth as I'd like to. AFAIK, it was a bit unfinished yet eg. not having any congestion control story, which is very important. But yeah... it's what what HTTP2 probably should have been in the first place.

xzyfer · on July 15, 2016

Hey all, I'm the author of that blog post. I'll be floating around for a while, happy to answer any questions.

xzyfer · on July 15, 2016

I'm out y'all. Thanks for all the support. Feel free to follow up on Twitter at @xzyfer

krschultz · on July 15, 2016

Well done, a great combination of clear graphs + theories.

wongarsu · on July 15, 2016

I don't think that the server is in charge of priorisation here. The server can do it, but there is no reason to push this responsibility onto the server when the browser can do it much better (for example the server can't know what's in the viewport).

I expect this will be quickly sorted out by more mature HTTP/2 implementations in browsers. Downloading every image at once is obviously a bad idea, and I expect such naive behaviour will soon be replaced by decent heuristics (even just downloading eight resources at once should be better in nearly all cases)

foota · on July 15, 2016

I think the real solution hear is for the browser to be able to communicate some sort of priority to the server, without having to download a limited number of files at once.

xzyfer · on July 15, 2016

Browser do currently do this. H2 has two types of prioritisations: weighted, and dependency.

All browsers implement weighted resource prioritisation and weigh resources by content type. This is a hold over from what they do for HTTP/1 connections.

Firefox has dependency resources prioritisation. https://bitsup.blogspot.com.au/2015/01/http2-dependency-prio...

The spec purposely leave how these heuristics should work to the implementor. Things will change and implementations will diverge over time.

The server ultimately being in control means we can tell the server what resources are important for specific pages with absolute knowledge of the page.

foota · on July 15, 2016

Oh wow, that's cool. Do you know if servers currently support this? Would this mostly be useful on a network level or do you think it would also be useful for like trying to be more intelligent about scheduling?

xzyfer · on July 15, 2016

It's hard to say for sure. Server implementations can vary wildly, make sure to test any implementation closely. I know from talking to CloudFlare that their implementation respects browser hints. Their implementation is also open source.

joobus · on July 15, 2016

One way to "solve" the time to visual completion would be to make all the images, but especially the larger images, progressive scan. For very large images, the difference in visual quality between 50% downloaded and 100% downloaded on most devices isn't noticeable, so the page would appear complete in half the time.

xzyfer · on July 15, 2016

Totally. There are a bunch of ways to address the performance issue. As I alluded to at the end of the post there serious technology considerations when preprocessing so much image data.

We're currently looking at whether we can solve use IntersectionObserver for efficient lazy loading of images before the enter the viewport.

pornel · on July 15, 2016

There's an excellent talk about doing exactly that, showing a working prototype & results measured:

https://www.youtube.com/watch?v=66JINbkBYqw

MichaelGG · on July 15, 2016

If there's a way to tell it not to render until x% downloaded, sure. Otherwise slower connections see the low-q versions for a while and it can disconcerting. Either to some users or some PMs.

xzyfer · on July 15, 2016

This is correct. Visually completion will not be achieved until the entirety of the images within the viewport are downloaded.

However progressive jpegs could improve initial paint times. These are dynamic so each page would have it's own unique (although related) profile.

b34r · on July 15, 2016

Doesn't that increase the file size though? They're looking at 3G load speeds, so any increase in file size is probably unwelcome.

ex3ndr · on July 15, 2016

Progressive JPG almost always smaller than generic one

ashmud · on July 15, 2016

OTOH, progressive JPEGs tend to require much more memory to decode. I do not have specific numbers to cite. Only going off of anecdotal usage of image programs over the years (e.g., Java photo uploaders that choked on progressive JPEGs).

mkj · on July 15, 2016

This seems to bode badly for CDNs versus own tuned servers, unless there's a way for origin websites to provide hints on response ordering?

xzyfer · on July 15, 2016

There are discussions happening on how browsers can allow authors to resource prioritisation hints. I'm curious to see where it goes.

We'd ideally like to be able to say – "prioritise 10 images in the viewport". You hack it together relatively efficiently using IntersectionObserver now, but support isn't great.

toast0 · on July 15, 2016

CDNs are still going to have lower latency and higher bandwidth, and likely more ability to have long lived connections. Probably whatever mechanism develops to facilitate http/2 server pushed resources through a CDN will also include prioritization hints.

hendry · on July 15, 2016

damn, CloudFront already doesn't expose many tunables, so yeah, this isn't going to work.

Starting to wish Appcache manifest actually was made to work and that could use used as a queue somehow to prioritise important assets on a Webpage.

taf2 · on July 15, 2016

ServiceWorkers are the app cache done right. https://github.com/w3c-webmob/ServiceWorkersDemos

hendry · on July 15, 2016

My point is that you can't map Service Workers onto a simple manifest, i.e. a list of resources the httpd needs to push as a priority.

You "kindof" can with a Appcache "manifest". Stretch of the imagination, I know.

cagenut · on July 15, 2016

Did I read this right that http1 was with cdn A (unnamed?) and http2 was with cdn B (cloudflare)?

If so, you really can't draw any conclusions about the protocol difference when the pop locations, network designs, hardware and software configurations could easily have made the kinds of differences you're seeing.

xzyfer · on July 15, 2016

You read it correct.

By not moving our render blocking assets like CSS, JS and fonts over to the http/2 we rule out performance changes due to improvements to head of line block.

Our images were always on a separate hostname so the DNS lookup over is the same. We also did some initial benchmarking and found the new CDN to be more efficient than the old one.

thinkMOAR · on July 15, 2016

Comparing two protocols using different providers, isn't that a bit comparing pears and apples? And i have a doubt, which could be bad assumption, but that it is on hardware you control or own and what exactly runs on it, and potentially which other parties use it.

jjcm · on July 15, 2016

Great writeup, and interesting seeing where http2 performed worse. Definitely going to refer to this as I update my backend to http2.

xzyfer · on July 15, 2016

Thanks mate, really appreciate it.

runeks · on July 15, 2016

I'm really looking forward to see how much HTTP/2 will increase performance for my Bitcoin payment channel server: https://github.com/runeksvendsen/restful-payment-channel-ser...

Just now I finished separating the front-end and back-end - by a RESTful protocol - and this roughly halved performance compared to using a native library (from ~2000 payments/second on my laptop to ~1000). I expect HTTP/2 to make a greater percentage-wise difference here, although I admit I really have no idea how much, say, ZeroMQ would have reduced performance, compared to cutting it in half using HTTP/1.x.

I expect HTTP/2 to make a much greater difference in high performance applications, where overhead becomes more important, which static file serving doesn't really hit. So I think RESTful backend servers will see a much more noticeable performance increase, especially since, if you use Chrome at least, as an end-user you already get many of the HTTP/2 latency benefits through SPDY.

e1g · on July 15, 2016

Useful related project: http://www.grpc.io/ is an excellent layer on top of HTTP2 for comms between backend services. It's from Google, and used by Docker and Square among others. It even comes with a rest-focused gateway https://github.com/grpc-ecosystem/grpc-gateway

runeks · on July 15, 2016

Thank you for the suggestion. What do you find excellent about this? What would I get in exchange for the added complexity? I must admit that I like the notion of "raw" HTTP, particularly because the server in question will be used primarily by other web services not written by me (it's basically a payment gateway), and everyone and their girlfriend knows HTTP.

I've worked once before with Protocol Buffers, and I can't say I enjoyed it. It ended up being a layer in the middle between the data and my parser, not really serving a purpose, since my parser (written in Haskell) is more strict that the Protocol Buffers specification allows. After that I see little value in Protocol Buffers over Haskell types which are (de)serialized to/from binary data. I don't find Protocol Buffers nearly verbose enough to define a protocol, so it becomes reduced to defining data types/structures, which other tools handle much better, in my opinion.

arca_vorago · on July 15, 2016

Aren't there more performant options than restful these days? Any particular reason why you didn't choose those?

Would that be a good case for websockets?

runeks · on July 15, 2016

There are definitely more performant options. I considered ZeroMQ for a while, and almost decided on it, but went for HTTP/REST because of the built-in error handling, and request-response style (as far as I can see, I would have to implement all of this if I chose to use ZeroMQ).

I also chose HTTP because, at the end of the day, I still get almost ~1000 payment per second after the change (on a laptop). VISA handles 200k payments/second at peak, so that's peak VISA levels on 200 MacBook Pros.

I see I might have misspoke when I said "front-end". It's really the front-end (logic part) of the backend server, which now comprises two parts: stateless logic ("front-end") and database (stateful) backend. So I haven't considered Websockets.

xzyfer · on July 15, 2016

H2's single long lived connection means it's a contender to replace websockets. As a bonus you get to use HTTP semantics.

manigandham · on July 15, 2016

Some solutions:

- Serve less data. The best speedup is when there's no more data to download and if the throughput for clients is maxed out, then decreasing page weight helps.

- Use async bootstrap JS code to load in other scripts once images are done loading or other page load events have fired.

- Load less images in parallel, use JS to load one row of images at a time.

- Use HTTP/2 push (which CloudFlare offers) to push some of the images/assets with any other response. Push images with the original HTML and you'll start getting the images to browser before it even parses the HTML and starts any (prioritized) requests.

dalore · on July 15, 2016

Wouldn't the standard solution of lazy loading images (and prioritizing critical css) help. Since they are now trying to load everything on a big page, they should only be trying to load everything above the fold.

manigandham · on July 15, 2016

Yes, the basic approach is the same. There's limited bandwidth and they have too many image assets going through the pipe at the same time. They can easily control this by just loading a few at a time, whether that's the first page or row or whatever (probably based on testing to see what "feels" the fastest).

It's a tried and tested approach and much better than just sending everything in the HTML in a single blast. There are hundreds of image-based sites out there, they all do this as an optimization.

pedalpete · on July 15, 2016

We've recently moved to Google Cloud Storage from AWS because of http/2. We had a bottleneck of the browser waiting when serving multiple large (8+files * 10mb+each).

I'm wondering if 99designs looked at any sort of domain sharding to get around the timing issues. If I understand correctly, wouldn't this get around the priority queue issue? Your js,fonts, etc. coming from a different address than your larger images, would create completely separate connections.

I'm not completely sure this would get around the issues mentioned, but I'm curious if it was looked at as a solution.

xzyfer · on July 15, 2016

The priority queue isn't the issue. In fact the priority queue is what kept our first paint times tanking because browsers prioritised render blocking resources instead of images.

The issue was due to the variance of image size. An image that is significantly larger than the page average will be loaded slower since all images get an equal share of bandwidth (priority). Adding sharding wouldn't help since the client only has a fixed amount of bandwidth to share and all images would still get the same share of it. Sharding could help if the bandwidth bottle neck was at the CDN but that's rarely going to be the case.

dalore · on July 15, 2016

Domain sharding is anti pattern for http2. Reason being for another domain it needs to make an expensive TLS handshake. With http2 on the same domain, it doesn't. We've done tests and even moved away from even having a static domain.

diegorbaquero · on July 15, 2016

Excellent and in-depth article. Thank you for sharing!

Hopefully we'll see a follow up with future changes and tweaks both from webservers and browsers.

xzyfer · on July 15, 2016

Thanks mate, glad you enjoyed it.

muteor · on July 15, 2016

I thought that HTTP/2 didn't fix head-of-line blocking and this was why QUIC (https://www.chromium.org/quic) existed.

From the project page:

Key features of QUIC over existing TCP+TLS+HTTP2 include

* Dramatically reduced connection establishment time

* Improved congestion control

* Multiplexing without head of line blocking

* Forward error correction

* Connection migration

Matthias247 · on July 15, 2016

IMHO HTTP/2 solves HOL blocking partly. It will allow other streams to proceed if one stream is blocked due to flow control (receiver doesn't read from the stream). E.g. if you have multiple parallel downloads over a single HTTP/2 connection one blocked/paused stream won't block the others.

However it doesn't have abilities that will allow individual streams to proceed if some packets are lost that only hold information for a single stream.

noahcollins · on July 15, 2016

Thanks for posting your findings - very useful data. It would be interesting to see the Webpagetest waterfalls in greater detail if you're able to share that.

You planning to use your resource hints to enable server push at CDN edge?

xzyfer · on July 15, 2016

Server push at the edge is problem atm. Current push semantics require the HTML document say which resources to push. That's an issue if you're serving assets off a CDN domain.

Asset domains make less since with h2 from a performance perspective but there are still security concerns that need to addressed.

noahcollins · on July 15, 2016

Good point if you're using push for page content that varies, like images in the the 99designs portfolio and gallery. That gets into dynamic caching territory.

As a first step, I'm focused on using push to cut latency between TTFB and processing of render-blocking static assets. Serving those from same domain as the base page, it should be easy for origin to supply edge with the list of resources push. Either in the HTML or with the `link` response header. It also means my critical assets are not longer behind a separate DNS lookup.

In the design gallery, this type of push approach could help you regain control of loading priority and get your fonts loading before that wall of images.

xzyfer · on July 15, 2016

The priority queue isn't the issue. In fact the priority queue is what kept our first paint times tanking because browsers prioritised render blocking resources instead of images.

The issue was due to the variance of image size. An image that is significantly larger than the page average will be loaded slower since all images get an equal share of bandwidth (priority).

We could further improve first paint times by pushing render blocking resources but we'd need to be serving those resources off the 99designs domain (with current push implementations). This opens us up to a class security issues we avoid by having an asset domain i.e. types of reflected XSS and serving cookies on assets.

For now we'll wait for the webperf working group to address the limitations with server push semantics.

noahcollins · on July 15, 2016

Interesting note on the impact of image size variation on queue, thanks for elaborating.

Serving those resources from the 99designs domain is worth a look. I considered the cookies and security trade offs as well. I found H2 compressed cookies enough to perform better than a separate cookieless domain for static assets, due to the DNS savings. DNS times can be bad at high percentiles. Reflected XSS addressed with a Content Security Policy. But I'm fortunate to have user base that supports CSP well.

esher · on July 15, 2016

we also did a far less sophisticated HTTP/2 reality check: https://blog.fortrabbit.com/http2-reality-check

about the same result: real world performance boost was not soooo big.

schallertd · on July 15, 2016

And still no OpenSSL 1.0.2 nor ALPN on most distros such as Debian Jessie... kinda sucks

secure · on July 15, 2016

It’s available in jessie-backports since 2016-07-02, see https://packages.debian.org/jessie-backports/libssl-dev

Given that jessie is stable, OpenSSL will not be updated to a newer version, only security updates will be made available.

What I’m trying to say: you’re never going to get it on jessie, unless you enable backports, at which point you’ll have it readily available.

Hope that helps

schallertd · on July 18, 2016

oh nice, thanks for the hint. didn't check the jessie-backports within the last two weeks :-)

mmel · on July 15, 2016

This is petty and inconsequential, but I really wish they used a section header called "Conclusion" instead of "Take Aways".

benschwarz · on July 15, 2016

You're right.