Founder of NuevoCloud here. If I read this right, you guys used Cloudflare for http 2. So let me ask you this, when you did your comparison, were all of the images cached (ie: x-cache: hit) at the edge?
The reason I ask is because cloudflare, last I checked, still hasn't implemented http2's client portion. So when a file is not cached, it does this:
Http2 is only used for the short hop between the client and edge node.. then the edge node uses http 1.1 for the connection to the origin server, which may be thousands of miles away.
In other words, in your test, depending on the client location and the origin server location.. your test may have used http 1.1 for the majority of the distance.
If you guys want to rerun this test on our network, we use http2 everywhere... your test would look like this on our network:
client <--http2--> edge node (closest to client) <--http2--> edge node (closest to server) <--http2--> origin server.
So even if your origin server doesn't support http2, it'll only use http 1.1 over the short hop between your server and the closest edge node.
You're welcome to email me if you want to discuss details you don't want to post here.
Edit: I should also mention, that we use multiple http 2 connections between our edge nodes and between the edgenode and origin server... removing that bottleneck. So only the client <--> edge node is a single http 2 connection.
To the best of my knowledge you are correct about how CloudFlare works. For context this data was collected over the period about a month on real production pages with significant traffic.
although cloudflare doesn't manage caches on a per account basis. Each PoP has a single LRU cache that's used for all customers. In other words, even if you've primed it, your files may have been pushed out of the cache by a larger customer.
In order to know this hasn't occurred, you really have to check the hit rate cloudflare is reporting (for static files that rarely change, this should be near or at 100%)... and when you're doing side-by-side comparisons (like the speed index), you have to actually check the x-cache headers to verify that a cache miss hasn't occurred. Otherwise, you wouldn't actually know that a significant portion of traffic isn't being sent over http 1.1 (because of cache misses).
> We cache as much as possible, for as long as possible. The more requested a file, the more likely it is to be in the cache even if you're on the Free plan. Lots of logic is applied to this, more than could fit in this reply. But importantly; there's no difference in how much you can cache between the plans. Wherever it is possible, we make the Free plan have as much capability as the other plans.
This does not confirm the exact statement but at least points in this direction.
I did not do any real tests and I might be completely wrong etc. but it seems to me that http2 is going to perform poorly over wireless links like 3g.
With http1 one had N tcp connections, and with the way tcp slowly increases the bandwidth used, and rapidly decreases it when packet is lost, even if any packet were dropped (which will happen quite a lot on 3g) other tcp streams were not delayed, or blocked, and can even utilize the leftover bandwidth, yielded by the stream that lost the packet.
With http2 however there's one tcp connection, so dropped packets will cause under-utilization of the bandwidth. On top of that dropped packets will cause all frames after them, to be delayed in kernel receiving buffer until the dropped packet is retransmitted, while in http1 case they would be available at the app level right away.
HTTP2 being implemented on top of TCP always seemed like a weird choice. It should have been UDP, IMO. That's why network accelerators like PacketZoom make so much sense. Note: I work in PacketZoom, I did not do any in-depth research on HTTP2, and this is my opinion, not necessarily of the company.
This is a real worry, but as with all things the actual behaviour of H2 on lossy networks is more complex than that.
TCP's congestion control algorithms don't work that well when you have many TCP streams competing for the same bandwidth. This is because while packet loss is a property of the link, not an individual TCP stream, each packet loss event necessarily only affects one TCP stream. This means the others don't get the true feedback about the lossiness of the connection. This behaviour can lead to a situation where all of your TCP streams try to over-ramp.
A single stream generally behaves better on such a link: it's getting a much more complete picture of the world.
However, your HOL blocking concern is real. This is why QUIC is being worked on. In QUIC, each HTTP request/response is streamed independently over UDP, which gets the behaviour you're talking about here, while also maintaining an overall view of packet loss for rate limiting purposes.
> can lead to a situation where all of your TCP streams try to over-ramp. ... A single stream generally behaves better on such a link: it's getting a much more complete picture of the world
Multiple HTTP connections work better for exactly this reason, because they are 'stealing' bandwidth from streaming video by 'not playing fair'. For example, 6 connections ramping back up bandwidth at 6x the rate of a single connection or sometimes only scaling back on 1/6th of the streams at once.
...which is fine because multiple parallel HTTP connections is usually a browser doing so for short-lived data transfers for active users and it is not the bulk of the data on the network.
I worry about TCP window scaling (and full TCP windows) when only using one TCP connection. There is a good reason download managers use multiple connections to download one file, because depending on the latency the maximum transfer rate is capped because only so many TCP packets can be in flight simultaneously. I wonder if nobody ever thought about that... HTTP1/x solved that (more by chance) with multiple connections...
Has anybody ever used this for denial of service? Make a large request so the TCP window scales up then just stop sending ACKs. All that data has to remain in the server's memory for retransmit. Even with a really short timeout before connections are dropped you could probably tie up a lot of server memory.
Have you looked at QUIC? It seems like it addresses the problems with HTTP2 over TCP as well as providing some additional benefits like like speeding up the secure connection establishment.
Yes, I'm aware of it, though not as in-depth as I'd like to. AFAIK, it was a bit unfinished yet eg. not having any congestion control story, which is very important. But yeah... it's what what HTTP2 probably should have been in the first place.
I don't think that the server is in charge of priorisation here. The server can do it, but there is no reason to push this responsibility onto the server when the browser can do it much better (for example the server can't know what's in the viewport).
I expect this will be quickly sorted out by more mature HTTP/2 implementations in browsers. Downloading every image at once is obviously a bad idea, and I expect such naive behaviour will soon be replaced by decent heuristics (even just downloading eight resources at once should be better in nearly all cases)
I think the real solution hear is for the browser to be able to communicate some sort of priority to the server, without having to download a limited number of files at once.
Browser do currently do this. H2 has two types of prioritisations: weighted, and dependency.
All browsers implement weighted resource prioritisation and weigh resources by content type. This is a hold over from what they do for HTTP/1 connections.
The spec purposely leave how these heuristics should work to the implementor. Things will change and implementations will diverge over time.
The server ultimately being in control means we can tell the server what resources are important for specific pages with absolute knowledge of the page.
Oh wow, that's cool. Do you know if servers currently support this? Would this mostly be useful on a network level or do you think it would also be useful for like trying to be more intelligent about scheduling?
It's hard to say for sure. Server implementations can vary wildly, make sure to test any implementation closely. I know from talking to CloudFlare that their implementation respects browser hints. Their implementation is also open source.
One way to "solve" the time to visual completion would be to make all the images, but especially the larger images, progressive scan. For very large images, the difference in visual quality between 50% downloaded and 100% downloaded on most devices isn't noticeable, so the page would appear complete in half the time.
Totally. There are a bunch of ways to address the performance issue. As I alluded to at the end of the post there serious technology considerations when preprocessing so much image data.
We're currently looking at whether we can solve use IntersectionObserver for efficient lazy loading of images before the enter the viewport.
If there's a way to tell it not to render until x% downloaded, sure. Otherwise slower connections see the low-q versions for a while and it can disconcerting. Either to some users or some PMs.
OTOH, progressive JPEGs tend to require much more memory to decode. I do not have specific numbers to cite. Only going off of anecdotal usage of image programs over the years (e.g., Java photo uploaders that choked on progressive JPEGs).
There are discussions happening on how browsers can allow authors to resource prioritisation hints. I'm curious to see where it goes.
We'd ideally like to be able to say – "prioritise 10 images in the viewport". You hack it together relatively efficiently using IntersectionObserver now, but support isn't great.
CDNs are still going to have lower latency and higher bandwidth, and likely more ability to have long lived connections. Probably whatever mechanism develops to facilitate http/2 server pushed resources through a CDN will also include prioritization hints.
Did I read this right that http1 was with cdn A (unnamed?) and http2 was with cdn B (cloudflare)?
If so, you really can't draw any conclusions about the protocol difference when the pop locations, network designs, hardware and software configurations could easily have made the kinds of differences you're seeing.
By not moving our render blocking assets like CSS, JS and fonts over to the http/2 we rule out performance changes due to improvements to head of line block.
Our images were always on a separate hostname so the DNS lookup over is the same. We also did some initial benchmarking and found the new CDN to be more efficient than the old one.
Comparing two protocols using different providers, isn't that a bit comparing pears and apples? And i have a doubt, which could be bad assumption, but that it is on hardware you control or own and what exactly runs on it, and potentially which other parties use it.
Just now I finished separating the front-end and back-end - by a RESTful protocol - and this roughly halved performance compared to using a native library (from ~2000 payments/second on my laptop to ~1000). I expect HTTP/2 to make a greater percentage-wise difference here, although I admit I really have no idea how much, say, ZeroMQ would have reduced performance, compared to cutting it in half using HTTP/1.x.
I expect HTTP/2 to make a much greater difference in high performance applications, where overhead becomes more important, which static file serving doesn't really hit. So I think RESTful backend servers will see a much more noticeable performance increase, especially since, if you use Chrome at least, as an end-user you already get many of the HTTP/2 latency benefits through SPDY.
Useful related project: http://www.grpc.io/ is an excellent layer on top of HTTP2 for comms between backend services. It's from Google, and used by Docker and Square among others. It even comes with a rest-focused gateway https://github.com/grpc-ecosystem/grpc-gateway
Thank you for the suggestion. What do you find excellent about this? What would I get in exchange for the added complexity? I must admit that I like the notion of "raw" HTTP, particularly because the server in question will be used primarily by other web services not written by me (it's basically a payment gateway), and everyone and their girlfriend knows HTTP.
I've worked once before with Protocol Buffers, and I can't say I enjoyed it. It ended up being a layer in the middle between the data and my parser, not really serving a purpose, since my parser (written in Haskell) is more strict that the Protocol Buffers specification allows. After that I see little value in Protocol Buffers over Haskell types which are (de)serialized to/from binary data. I don't find Protocol Buffers nearly verbose enough to define a protocol, so it becomes reduced to defining data types/structures, which other tools handle much better, in my opinion.
There are definitely more performant options. I considered ZeroMQ for a while, and almost decided on it, but went for HTTP/REST because of the built-in error handling, and request-response style (as far as I can see, I would have to implement all of this if I chose to use ZeroMQ).
I also chose HTTP because, at the end of the day, I still get almost ~1000 payment per second after the change (on a laptop). VISA handles 200k payments/second at peak, so that's peak VISA levels on 200 MacBook Pros.
I see I might have misspoke when I said "front-end". It's really the front-end (logic part) of the backend server, which now comprises two parts: stateless logic ("front-end") and database (stateful) backend. So I haven't considered Websockets.
- Serve less data. The best speedup is when there's no more data to download and if the throughput for clients is maxed out, then decreasing page weight helps.
- Use async bootstrap JS code to load in other scripts once images are done loading or other page load events have fired.
- Load less images in parallel, use JS to load one row of images at a time.
- Use HTTP/2 push (which CloudFlare offers) to push some of the images/assets with any other response. Push images with the original HTML and you'll start getting the images to browser before it even parses the HTML and starts any (prioritized) requests.
Wouldn't the standard solution of lazy loading images (and prioritizing critical css) help. Since they are now trying to load everything on a big page, they should only be trying to load everything above the fold.
Yes, the basic approach is the same. There's limited bandwidth and they have too many image assets going through the pipe at the same time. They can easily control this by just loading a few at a time, whether that's the first page or row or whatever (probably based on testing to see what "feels" the fastest).
It's a tried and tested approach and much better than just sending everything in the HTML in a single blast. There are hundreds of image-based sites out there, they all do this as an optimization.
We've recently moved to Google Cloud Storage from AWS because of http/2. We had a bottleneck of the browser waiting when serving multiple large (8+files * 10mb+each).
I'm wondering if 99designs looked at any sort of domain sharding to get around the timing issues. If I understand correctly, wouldn't this get around the priority queue issue? Your js,fonts, etc. coming from a different address than your larger images, would create completely separate connections.
I'm not completely sure this would get around the issues mentioned, but I'm curious if it was looked at as a solution.
The priority queue isn't the issue. In fact the priority queue is what kept our first paint times tanking because browsers prioritised render blocking resources instead of images.
The issue was due to the variance of image size. An image that is significantly larger than the page average will be loaded slower since all images get an equal share of bandwidth (priority). Adding sharding wouldn't help since the client only has a fixed amount of bandwidth to share and all images would still get the same share of it. Sharding could help if the bandwidth bottle neck was at the CDN but that's rarely going to be the case.
Domain sharding is anti pattern for http2. Reason being for another domain it needs to make an expensive TLS handshake. With http2 on the same domain, it doesn't. We've done tests and even moved away from even having a static domain.
IMHO HTTP/2 solves HOL blocking partly. It will allow other streams to proceed if one stream is blocked due to flow control (receiver doesn't read from the stream). E.g. if you have multiple parallel downloads over a single HTTP/2 connection one blocked/paused stream won't block the others.
However it doesn't have abilities that will allow individual streams to proceed if some packets are lost that only hold information for a single stream.
Thanks for posting your findings - very useful data. It would be interesting to see the Webpagetest waterfalls in greater detail if you're able to share that.
You planning to use your resource hints to enable server push at CDN edge?
Server push at the edge is problem atm. Current push semantics require the HTML document say which resources to push. That's an issue if you're serving assets off a CDN domain.
Asset domains make less since with h2 from a performance perspective but there are still security concerns that need to addressed.
Good point if you're using push for page content that varies, like images in the the 99designs portfolio and gallery. That gets into dynamic caching territory.
As a first step, I'm focused on using push to cut latency between TTFB and processing of render-blocking static assets. Serving those from same domain as the base page, it should be easy for origin to supply edge with the list of resources push. Either in the HTML or with the `link` response header. It also means my critical assets are not longer behind a separate DNS lookup.
In the design gallery, this type of push approach could help you regain control of loading priority and get your fonts loading before that wall of images.
The priority queue isn't the issue. In fact the priority queue is what kept our first paint times tanking because browsers prioritised render blocking resources instead of images.
The issue was due to the variance of image size. An image that is significantly larger than the page average will be loaded slower since all images get an equal share of bandwidth (priority).
We could further improve first paint times by pushing render blocking resources but we'd need to be serving those resources off the 99designs domain (with current push implementations). This opens us up to a class security issues we avoid by having an asset domain i.e. types of reflected XSS and serving cookies on assets.
For now we'll wait for the webperf working group to address the limitations with server push semantics.
Interesting note on the impact of image size variation on queue, thanks for elaborating.
Serving those resources from the 99designs domain is worth a look. I considered the cookies and security trade offs as well. I found H2 compressed cookies enough to perform better than a separate cookieless domain for static assets, due to the DNS savings. DNS times can be bad at high percentiles. Reflected XSS addressed with a Content Security Policy. But I'm fortunate to have user base that supports CSP well.
The reason I ask is because cloudflare, last I checked, still hasn't implemented http2's client portion. So when a file is not cached, it does this:
client <--http2--> edge node <--http 1.1--> origin server.
Http2 is only used for the short hop between the client and edge node.. then the edge node uses http 1.1 for the connection to the origin server, which may be thousands of miles away.
In other words, in your test, depending on the client location and the origin server location.. your test may have used http 1.1 for the majority of the distance.
If you guys want to rerun this test on our network, we use http2 everywhere... your test would look like this on our network:
client <--http2--> edge node (closest to client) <--http2--> edge node (closest to server) <--http2--> origin server.
So even if your origin server doesn't support http2, it'll only use http 1.1 over the short hop between your server and the closest edge node.
You're welcome to email me if you want to discuss details you don't want to post here.
Edit: I should also mention, that we use multiple http 2 connections between our edge nodes and between the edgenode and origin server... removing that bottleneck. So only the client <--> edge node is a single http 2 connection.