Hacker News new | past | comments | ask | show | jobs | submit login
Blink: Intent to Remove: HTTP/2 and gQUIC server push (groups.google.com)
175 points by aciswhat on Nov 12, 2020 | hide | past | favorite | 133 comments



Five years ago, we built a company around HTTP/2 server push. Here is what we learned:

- The interaction of HTTP/2 push with browser caches were left unspecified for the most part, and browsers implemented different ad-hoc policies.

- Safari in particular was pretty bad.

- Since HTTP/2 Push worked at a different layer than the rest of a web application, our offering centered around reverse-engineering traffic patterns, with the help of statistics and machine learning. We would find the resources which were more often not cached, and push those.

- HTTP/2 Push, when well implemented, offered reductions in time to DOMContentLoaded in the order of 5 to 30%. However, web traffic is noisy and visitors fall in many different buckets by network connection type and latency. Finding that 5% to 30% performance gain required looking to those buckets. And, DOMContentLoaded doesn't include image loading, and those dominated the overall page loading time.

- As the size of, say, Javascript increases, the gains from using HTTP/2 Push asymptotically tend to zero.

- The PUSH_PROMISE packets did indeed could increase loading time because they needed to be sent when the TCP connection was still cold. At that point in time, each byte costs more latency-wise.

- If a pushed resource was not matched or not needed, the loaded time increased again.

Being a tiny company, we eventually moved on and found other ways of decreasing loading times that were easier for us to implement and maintain and also easier to explain to our customers.


This and so much this. I also worked with HTTP/2 (when it was still a SPDY draft) and came to the same conclusions: TCP peculiarities (most notably congestion control and its dynamic window scaling) mostly eradicate the benefits of HTTP/2. HTTP/1.1 was using 5-6 different TCP connections (which re-use OS level window scaling caches), while HTTP/2 you had to transfer all resources through a single TCP connection.

People often claim HTTP/2 to be superior due to its multiplexing capabilities, but always reason with the mental model of having true stream-based flows and leaving TCP completely out of the argument loop. But guess what, that model is just a model, indeed you have packet-based TCP flows that are abstracted away to mimic a continuous stream as a socket - and those matter.


Would it be possible to use a patched TCP stack to start up quicker?


sure, this is basically increasing the initcwnd (the initial number of TCP segments allowed to be sent out before you received an ACK from the other end). Linux supports this through the `ip` command (look for initcwnd). This, however, requires root rights and affects the whole network device. Google submitted a patch to the linux kernel that would have allowed tweaking the initcwn from userspace for individual connections; however it got rejected and never made it upstream (for good reasons).

See, the problem is, if the initcwnd is set higher on OSes by default, you risk introducing more packet loss, and at the same time reducing everybodys network throughput - if you send more data than the network path to your destination can handle (i.e. full router buffers along the way, more data than client can receive etc.). TCPs congestion control is pretty smart about avoiding congestion or increasing load in already congested networks. So yes, you can set a higher initcwnd, but if everybody was about to increase it by a significant factor, chances are networks become slower for everyone, without anyone benefiting.

Long story short, the limitations of TCP for HTTP are known and the major motivation for QUIC (over UDP), that will allow Browser vendors (Google) and server operators (Google, but also other large players like Facebook etc.) to just "give a shit" about others and put as much pressure on the network as they want in order to bring latency down. Once Chrome rolls out with HTTP/3 support, Google couldn't care less what the linux kernel devs or IEEE network engineers have to say.


If I use wired internet, shouldn't it be the ISP's responsibility to throttle / apply backpressure? Like if I'm paying for a certain speed why shouldn't I try my best to make use of it from the start?

(I get that mobile internet requires everyone to cooperate to avoid everyone stepping on each others' toes)


Well if a server sends more data to a requesting client that the path to that client can handle (considering current state of congestion along that path) the result is packet loss. That is, data that got send is lost, and the packets need to be re-transmitted. So if in that situation you set your initcwnd too high in an effort to improve speed, you would actually achieve the opposite effect: The data that you send gets lost and needs to be retransmitted - resulting in even slower throughput than if you had started more easy with a smaller initcwnd.

If a big player like Google or Netflix would decide to, say, double or triple its initcwnd from one day to the other, the result would probably be more congestion (more overloaded nodes in the whole network), higher packet loss, increase of traffic at peering points and the overall throughput to their users would be less than before - at the same time putting more load on the network so that even users not using Netflix or Google would be affected and suffer from lower throughputs.

You can view the whole situation also under aspects of game theory: If in a movie theater at the end of the film everybody got up immediately and ran to the exit door at the same time, the effect would be a clogged door and people would get out of the theater at a slower rate than in the current situation, where people look at the congestion rate and say "naah to many people at the exit, i sit for a while longer". TCP has built-in backoff and there is a wide range of different algorithms that solve this "try to get to the point so I can send as much as possible as quickly as possible, but if packet loss occurs go slower in order not to put even more pressure on the network". Search for TCP Reno, Tahoe, Vegas, Cubic, BIC etc. or look at the Wikipedia page of what is available: https://en.wikipedia.org/wiki/TCP_congestion_control

EDIT: regarding ISP applying backpressure: They do, in fact every router (even the ones in your home) does this automatically: You have a buffer you put packets in scheduled for delivery. If the buffer-file rate exceeds the buffer drain rate (=input data rate is higher than outgoing) the buffer runs full and packets get dropped. But this is not mitigating the problem: Every dropped packet gets retransmitted and will come again a few ms later. So if the sending side wouldn't back off (=throttle down transmission rate) and just retransmit at the same rate as before, the buffer would permanently run full, packet drops would occur again, and your connection will basically fail.


Building a compagny arround a very specific tech not finalized seems very risky.


I think this is the right decision. I looked at HTTP/2 push in 2017 and the design is very confusing, and the implementations are pretty bad. https://jakearchibald.com/2017/h2-push-tougher-than-i-though....

Chrome's implementation was best, but the design of HTTP/2 push makes it really hard to do the right thing. Not just when it comes to pushing resources unnecessarily, but also delaying the delivery of higher priority resources.

<link rel="preload"> is much simpler to understand and use, and can be optimised by the browser.

Disclaimer: I work on the Chrome team, but I'm not on the networking team, and wasn't involved in this decision.


Do you know maybe whether there are similar adoption rates in HTTP/3?

As someone that implemented http/1.1 almost feature complete [1], I think that the real problem on the web is the missing of a testsuite.

There needs to be a testsuite for http that both clients and servers can test against, and that is not tied to internal codes of a web browser.

I say this because even a simple spec like 206 with multiple range requests literally never is supported by any web server. Even Nginx, apache or google's own dns over https servers behave differently if the request headers expect multiple response bodies. Let alone the unpredictability of chunked encodings, which is another nightmare.

I really think that there should be an official testsuite that is maintained to reflect the specifications, similar to the intention of the (meanwhile outdated) acid tests.

Adoption rates to new http versions will always stay low if you implement it for months exactly as the spec says, just to figure out that no real world webserver actually implements it in the same way.

[1] https://github.com/tholian-network/stealth/blob/X0/stealth/s...


+1 The complexity of HTTP now is so much that it's probably prohibitively risky to roll your own implementation.


Agree 100% . I never saw a use for server push. I'm glad to see unused features go, web browsers are far too bloated already.

The web would be fine if they stopped adding features today for the next 10 years. The massive complexity of the browser will eventually be a detriment to the platform


When Server Push became a thing, I liked the idea quite a lot, and I find it somewhat sad to see it disappear again, but realistically speaking, it wasn't used that much, so it might be for the best to just let it go.


The big issue with it, IMO, is it simply doesn't work with load balancers. You HAVE to have sticky sessions which, unfortunately, really limits some of the best benefits of http2.


thanks for that post. that and some other resources convinced me then that my time is better spent elsewhere.

in the post it says less then .1% connections in Chrome receive a push event. some people will always try out the cutting edge, but the fact it hasn't spread after several years is a pretty good indicator that it's not producing the expected results.

I don't know why things that are "nice to have, but not essential" and at the same time not really working need to be kept, just because they're in a standard. if it was essential I'd view it differently, but in this case I hope it gets dropped.


Web was always about backwards compatibility. You should be able to contact with HTTP web server deployed in 1995 from modern web browser.

Server push is different, because it's supposed to be an invisible optimization, so it could be dropped without anyone noticing. But most things are not invisible.


> You should be able to contact with HTTP web server deployed in 1995 from modern web browser.

AFAIK, a web server deployed in 1995 would probably be using HTTP/0.9, and I think modern web browsers don't support any HTTP older than HTTP/1.0 anymore.


I tend to disagree. Server push was a cool way for implementing streaming like in hls particularly when you have constrained devices that otherwise would suffer from request latencies.

However, IMHO the Internet has mostly degraded to a huge CDN. Http/2 is often not even handled by the final end point. Decentralized caching and proactive cache control has become a niche.

Having said that, I still dream of a world in which browsers just care about rendering, rather than defacto shaping the future arch of the net on all layers (DoH, https cert policies, quic MTUs, ...)


Note that this is also available via the `Link:` HTTP header. This means that you can get the preload hint to the browser quite early so there shouldn't be too much delay before it makes the request.

Of course if your HTML is small it may still be slightly slower than push. However the advantage is that you don't push cached resources over and over again.


Unfortunately, as far as I'm aware, Link is only a RFC (5988, 8288), and nobody has actually implemented using it.


This is pretty funny. After all the fuss, turns out that server push isn't really useful. I'm half impressed that they are able to actually admit they were wrong (implicitly of course) by removing it. I can't say I'm surprised, either. I could never think of a reasonable use case for it.

With the variety of streaming options available now, it really seems antiquated.


It's too bad they didn't realize it sooner, or it could have been removed from HTTP/3 before IETF finalizes it. Probably too late now?


It's currently in Last Call, but if there's consensus to modify HTTP/3 that can be done at any point, although obviously doing it after it's actually published would be in the form of errata or a -bis RFC that modifies the standard. It's not extraordinary for something which has consensus to get done really late, even in AUTH48 (a notional "48 hours" for the authors of the document to fix any last little things that often lasts rather longer than two Earth days) if that's what people decide is necessary.

But "Blink doesn't want to do it" isn't consensus on its own, this page suggests other clients implement this and offers no opinion about whether they intend to likewise deprecate the feature.


Even before HTTP/2 was standardised there were some strong arguments by Mike Belshe (co-creater of SPDY) and others that we should drop push

But it never happened and so made it into the H2 standard


Server push is quite useful for bidirectional streaming of data which is heavily used in gRPC to help reduce latency and shovel large amounts of data.


gRPC does not use PUSH_PROMISE. It uses HTTP/2 streams, but those streams all get initiated by the client.


This 100%. There is so much confusion between HTTP/2 Push and and HTTP/2 bidirectional streaming capabilities, I personally had to go back to the Spec. to understand the difference because a lot of material just mixes them together. They are not the same thing.


I think this is different than server push like WebSockets; this is all about pushing down resources the server suspects you'll need.


Websockets are different again from both server push and h2 streams. Server push was a feature added in http2 for populating the browser’s cache. Websockets are an http/1.1 extension for tcp-like bidirectional messaging.

Server push, websockets, h2 steams and webtransport are all different things.


Don't forget Server Sent Events[1] as well, which is yet another different technology.

[1] https://en.wikipedia.org/wiki/Server-sent_events


Yea but unlike http2 push, server sent events were actually useful! You could use it super easily and by holding open a connection and pushing down little json or XML payloads you could build all sorts of things easily in JavaScript. I even saw super basic chat functionally built using two Iframes a form in one and the chat log pushed back from the server as a stream of valid but open ended HTML abusing how lax the browser was about parsing missing certain closing tags at the time. SSE was and still is super easy to use and a very useful option if you can’t or don’t want to shift to web sockets.


Most applications that use WebSockets should be using SSE instead, but WS has better mindshare, and SSE doesn't work in IE11.


SSE can be polyfilled to work in IE11. It can be implemented on top of XHR from ie7 or so onwards.

Some caching proxies struggle with it though, because they try to buffer an entire response before forwarding it. And it’s 1 way. For something like collaborative editing, crafting a POST request for every keystroke feels wasteful, though it works great in practice.


I'd second that!

I use the heck out of SSE to add update notifications to various of my apps because it's just so easy to do. Have it glued to a Redis stream (or pubsub channel) with https://github.com/bonkabonka/sseredis (though that shim looks like it needs a bit of love and a better README.)


And the problem there is shoe horning RPC requirements into a hypermedia transport.

I feel we've made a big mistake following Google's micro service requirements for what should be a ubiquitous hypermedia back bone.

It's bewildering that so many smart people could end up conflating the two for our standard protocols.


I will be very sceptical about all "real world usage" statistic claims from Google.


They’re one of very few companies in the world that are in the position to actually measure this.


Sadly, HTTP 103 hints, which provide a much saner way to preload content, are still unimplemented in all major browsers.


For people like me who are unfamiliar with the proposal, 103 Early Hints [1] would work like this.

Client request:

  GET / HTTP/1.1
  Host: example.com
Server response:

  HTTP/1.1 103 Early Hints
  Link: </style.css>; rel=preload; as=style
  Link: </script.js>; rel=preload; as=script
  
  HTTP/1.1 200 OK
  Date: Fri, 26 May 2017 10:02:11 GMT
  Content-Length: 1234
  Content-Type: text/html; charset=utf-8
  Link: </style.css>; rel=preload; as=style
  Link: </script.js>; rel=preload; as=script
  
  <!doctype html>
  [... rest of the response body is omitted from the example ...]
[1]: https://tools.ietf.org/html/rfc8297


Do you mean that the same HTTP/1.1 GET request can have two different headers with two different status code?

What's the reported status code of this response in typical libraries? Usually the status code is a single value and not a list.


10x status codes are already used for multipart requests and responses, right? 100 Continue and the like. This is usually handled transparently by your client library.


I'm going to assume many HTTP libraries and ad-hoc implementations expect one HTTP response for one HTTP request (but still support pipelining) and will break if this underlying assumption changes.

In fact, this may even be ratified in the interfaces they provide: think synchronous "give contents of this URL" functions.


That's the most significant criticism of RFC 8297 (the 103 Early Hints response) from my cursory reading of the RFC. It should only be allowed when the client indicates that it can process 1xx responses. All other 1xx responses are like that AFAIK, e.g. 100 Continue is only sent when the client sets the "Expect: 100-continue" header. So a client that doesn't do that doesn't need to care about 100 Continue responses.


The client should send Expect: 103-early-access in order for the server to send it. :(


Yep, two (or more, the spec allows for multiple 103 responses before the "real" response) responses returned from one request.

The one library I threw a test server at didn't respond well. It treated the 103 as the response, and the actual 200 as the body. It was an older library, and the spec suggests using client sniffing to pick which clients to send 103 to. That's kinda when I stopped trying to figure it out, I'm not surprised to learn no one really implements it.


Yes, that's how 100-Continue works and it is widely implemented (but mostly this is hidden by application libraries):

https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/100


So at best this saves a couple of bytes until you get to the html header which can link to these resources with preload attributes anyway?


The intent is to allow even earlier preloading than what `Link` headers in the response or `<link>` in the HTML would typically allow. The insight here is that most application servers/webservers get a request and only send the response headers all at once, with the body, once everything is ready. But many times you already know some static resources ahead of time, before you know what response to send back.

For example, if you make a request for `GET /users/id/12345`, you know the user probably needs `style.css` like most do. You put that into a `Link` header in the response. But the browser won't fetch that file until it is told about it. It will only be told about it once it has seen the entire response, because again, the server (probably) sends it all at once, body and headers together, anyway.

But, producing that page in the first place may require running database queries for user info, your template engine to get run for the page body, an external HTTP call to happen in the background for their Gravatar, etc. So all of that has to happen on the server side before style.css can be fetched by the client browser. Even though you (the developer) knew they (the client) probably needed it. Let's say that `GET` request needs 500ms of computation (DB query + API call) before a reply is sent. That means the client has to wait at least 500ms for the server to produce any Link headers -- 500ms before it can fetch style.css. More importantly, it's literally 500ms where the client browser does nothing while it waits, while it could be concurrently downloading other assets, for example.

With Early Hints, when someone issues a `GET /users/id/12345`, the server can immediately reply using an Early Hint and tell the client about style.css first-thing. Then the client can immediately go fetch style.css, without waiting 500ms for the server to produce the body, or the headers, or even anything at all.

One interesting advantage of this is that, like Link headers, but unlike Server Push, early hints can be statically computed at at "application build time." You just need a mapping of URIs to URIs, like preload. So you can imagine for example your Django Middleware or Rails App or NodeJS Thing-Doer might have a step where it calculate asset dependencies for your frontend, and has a middleware layer that automatically issues 103 Early Hints for statically known page assets. And unlike Push, Early Hints are client-controlled. A big problem with Push is that it has no knowledge of the client-side cache, so it can push assets unnecessarily and cause slowdown. Early hints has no such drawback; the client has complete knowledge of its cache state and can make the correct decision.


I'm not sure if i see much benefit over just having a header on the 200 response. I suppose if it takes a bunch of time to generate the response body, but most of the time where this sort of thing would make a difference you probably already have the response body for the main request cached on the server side.


I guess the advantage is that you can send this before you know the status code of the response? For example if you know what CSS+JS file the client will need but still need to generate the page which may 500, 404 or 302.


And I can't understand why!


Server push never made any sense anyway for most subresources. The server doesn't know about the state of the browser's cache, so using server push for a subresource could cause it to needlessly push a subresource that was already cached by the browser.

Maybe it is useful outside of the browser context, e.g. in gRPC.


gRPC does not use PUSH_PROMISE frames.


I guess you could see if the client has a cookie, and infer cache state from that.


HTTP/2 server push is odd. When you can get the most benefit out of it, you usually don't care, because your resources are small and load quickly anyway. And when you care (because your resources are big don't load quickly enough), HTTP push actually hurts you, because you are pushing the same big resource on every page load.

I have tried to use it once, and the hassle of distinguishing between first time visits and repeated visits is simply not worth it. Even the hassle of using <link rel="preload"> is usually not worth it in large apps — if you have time for that, it can be better spent on reducing size of assets.


I think the benefits are where some automated system writes your "preload" entries for you.

For example, a webpack stage could render inside a sandbox each page of your site, detect which resources get loaded, and add all of those as preload/server push entries. The server itself can keep records of which resources have been pushed to a specific client, and not push them again.

Writing preload lists by hand is never going to scale with today's web apps with hundreds or thousands of requests for a typical page.


> your resources are small and load quickly

Size is only semi-related to latency. For small resources, latency costs dominate. That's what push addresses.


HTTP/2 Push allows to _really_ optimize first page load, giving a wow effect. (Especially when you can use a single server w/o geodns to serve the whole world with a really low latency!)

I use it on my pet project website, and it allows for a remarkable first page load time.

And I don't have to make all these old-school tricks, like inlining CSS & JS.

HTTP/2 Push allows for such a pleasant website development. You can have hundreds of images on the page, and normally, you'd be latency-bound to load it in a reasonable amount of time. And the way to solve it old-school is to merge them all into a one big image, and use CSS to use parts of the image instead of separate image URLs. This is an ugly solution for a latency problem. Push is so much better!

The fact that 99% of people are too lazy to learn a new trick shouldn't really hamstring people into using 30-year old tricks to get to a decent latency!


> The fact that 99% of people are too lazy to learn a new trick shouldn't really hamstring people into using 30-year old tricks to get to a decent latency!

What amazes me is that in this <1% there is not even Google, which implemented push in its own protocol. Any insights on that?


Looking at their highly optimized code on google.com, my guess is that they already use all the tricks that work on _every_ browser. And it might not make much sense for them to have two builds: one elegant, with multiple files per page for newer browsers, and ugly optimized blob for older ones.


Would inlining CSS and JS work on web sites (not apps)? I kinda feel inlined JS will bypass bytecode cache and the parsing costs have to be paid on each page load.


Inlined scripts do get cached, but cache is keyed to document's URL. Go https://v8.dev/blog/code-caching-for-devs and search for "inline".


Yep, inlining only optimizes first page load.


The whole point of this discussion is that push only optimizes the first load, and then it pessimizes all subsequent loads. That's why no one adopted it.


They quote a study by Akamai, a CDN company. Of course, if you run a CDN then you don't like products that help reduce latency when you don't have a CDN...

Server push is most useful in cases where latency is high, i.e. server and client are at different ends of the globe. It helps reduce round trips needed to load a website. Any good CDN has nodes at most important locations so the latency to the server will be low. Thus server push won't be as helpful.


CDNs loved the idea of HTTP/2 push. It's a complicated low level feature. To make it work, you'd need to figure out what to push, and the ideal way to prioritise and multiplex those streams to optimise for first render. CDNs are in the business of knowing this stuff better than anyone else, yet they still couldn't make it work.

Remember, most sites using CDNs still go to the root server for HTML and other no-cache content. It's only the more optimised sites that figure out how to deliver those resources straight from the CDN without consulting the end server.


Why doesn't the client keep a bloom filter of already-requested URLs on that website and send it along to the server on the first request? That way, you'd get the less-space benefit of link rel, but the latency benefit of push.

Also, this is very Google: "Well, few people have adopted it over five years, time to remove it." HTTPS is almost as old as HTTP and is only now starting to become universal. Google has no patience, seriously.


Bloom filter is indeed a sufficiently obvious idea that it got already proposed. But it wasn't accepted and was expired in 2019. See Cache Digests for HTTP/2.

https://tools.ietf.org/html/draft-ietf-httpbis-cache-digest-...


The bloom filter seems like the obvious solution!

I even spent the best part of a week back in 2017 trying to build a bloom filter into Chrome's HTTP cache so each connection could send to the server a tiny filter of the resources already cached, and then the server could send back a package of "everything needed to render the page you have requested". Turns out the HTTP cache is complex so I gave up.

If fully implemented, it ought to be able to cut render times dramatically, and to eclipse the performance benefit of cdn's (where the main benefit is reducing latency for static assets).

There are potential privacy concerns, but no moreso than first party cookies.


Server Push has a use case for web APIs. I just published a benchmark showing that under certain conditions APIs using Server Push (such as APIs implementing the https://vulcain.rocks specification) can be 4x times faster than APIs generating compound documents (GraphQL-like): https://github.com/dunglas/api-parallelism-benchmark

They key point for performance is to send relations in parallel in separate HTTP streams. Even without Server Push Vulcain-like APIs are still faster than APIs relying on compound documents thanks to Preload links and to HTTP/2 / HTTP/3 multiplexing.

Using Preload links also fixes the over-pushing problem (pushing a relation already in a server-side or client-side cache), some limitations regarding authorization (by default most servers don't propagate the Authorization HTTP header nor cookies in the push request), and and is easier to implement.

(By the way Preload links were supported from day 1 by the Vulcain Gateway Server.)

However, using Preload links introduce a bit more latency than using Server Push. Does the theoretical performance gain is worth the added complexity? To be honest I don't know. I guess it doesn't.

Using Preload links combined with Early Hints (the 103 status code - RFC 8297) may totally remove the need for Server Push. And Early Hints are way easier than Server Push to implement (it's even possible in PHP!).

Unfortunately browsers don't support Early Hints yet.

- Chrome bug: https://bugs.chromium.org/p/chromium/issues/detail?id=671310

- Firefox bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1407355

For the API use case, it would be nice that Blink adds support of Early Hints before killing Server Push!


I'm sorry to disappoint you, but your benchmark methodology is flawed. You did not consider TCP congestion control/window scaling. TCP connections between to peers are "cold" (=slow) after the 3-way handshake, and it takes several roundtrips to "warm" them up (allow data to be sent at a level that saturates your bandwidth). The mistake you (and most other people performing HTTP load benchmarks) made, is that the Kernel (Linux, but also all other major OS Kernels) caches the state of the "warm" connection based on the IP adress. So basically, when you run this kind of benchmark with 1000 subsequent runs, only your first run uses a "cold" TCP connection. All other 999 runs will re-use the cached TCP congestion control send window, and start with a "hot" connection.

The bad news: For website requests <2MB, you spend most of your time waiting for the round-trips to complete, say: you spend most of the time warming up the TCP connection. So its very likely that if you redo your benchmarks clearing the window cache between runs (google tcp_no_metrics_save) you will get completely different results.

Here is an analogy: If you want to compare the acceleration of 2 cars, you would have race them from point A to point B starting at a velocity of 0mph at point A, and measure the time it takes to reach to point B. In your benchmark, you basically allowed the cars to start 100 meters before point A, and will measure the time it takes between passing point A and B. Frankly, for cars, acceleration decreases with increasing velocity; for TCP its the other way around: the amount of data allowed to send on a round trip gets larger with every rountrip (usually somewhat exponentially).


Hi, and thanks for the feedback.

I'm aware of this "issue" (I must mention it in the repo, and I will). However, I don't think that it matters much for a web API: in most cases, inside web browsers, the TCP connection will already be "warmed" when the browser will send the first (and subsequent) requests to the API, because the browser will have loaded the HTML page, the JS code etc, usually from the same origin. And even if it isn't the case (mobile apps, API served from a third-party origin...) only the firsts requests will have to "warm" the connection (it doesn't matter if you use compound or atomic documents then), all subsequent requests, during the lifetime of the TCP connection, will use a "warmed" connection.

Or am I missing something?

Anyway, a PR to improve this benchmark (which aims at measuring the difference - if any - between serving atomic documents vs serving compound documents in real-life use cases) and show all cases will be very welcome!


What you say would be true if images/js/css is truely served by the same IP adresse (not hostname!). In reality, people use CDNs to deliver the static assets like images/js/css, and only the API calls are used to warm up the TCP connection to the actuall data backend. Also things like DNS load-balancing would break the warm up, because the congestion control caches operate on IPs, not hostnames.

Additionally, its really hard to benchmark and claim it is "faster". You will always measure using the same networking conditions (latency, packet loss rate, bandwidth). So if a benchmark between two machines yields faster results using technology A, the same benchmark may return complete different results for different link paramters. Point being: Optimizing for a single set of link parameters is unfeasible, you'd have to vary networking link conditions and find some kind of metric to determine what means "fast": Average across all paraters? Or rather weighted averages depending on your 98th percentile of your userbase etc.

Regarding improving the benchmarks: It is really hard, since (a) docker images cannot really modify TCP stack settings on the docker host and (b) client and server would have to flush their TCP congestion control caches at the same time, and only after both flushed the next run can be conducted.

EDIT: Regarding serving static assets to warm up the connection: In that case, you'd have to include page-load time to download that assets in your meassurement (including time to parse+execute JS) and overall time comparison. Switching the API prototocol from REST to something else will probably not have that big of an impact on the total load time then. Saying: If you spend 80% of your time downloading index.html, css, some javascript etc. and querying your API accounts only 20% of the time, you will only be able to optimized on that 20%. Even if you cut load times for the API calls in half, overall speedup for the whole page load would be 10%.


I'm surprised about the timing.

The serverless/edge technologies becoming available at CDNs are making it easy to imagine "automatic push" could come soon.

Any chance there are folks from Vercel or Netlify here and can shed light on why push hasn't been implemented in their platforms (or if it has)? At first glance, it seems like Next.js in particular (server rendering) is ripe for automatic push.


you can push now using headers or custom vcl with fastly:

https://docs.fastly.com/en/guides/http2-server-push


To me, the jury is very much still out on push. It's entirely in-determinate how useful it is, because only a handful of people have stepped up to try. There is a lot of trickery to getting the server to determine what resources to push, that it knows the client needs, but basics like "let's look at the main pages etag to figure it out" got very little experimentation & tries, certainly very few documented.

> Chrome currently supports handling push streams over HTTP/2 and gQUIC, and this intent is about removing support over both protocols. Chrome does not support push over HTTP/3 and adding support is not on the roadmap.

I am shocked & terrified that google would consider not supporting a sizable chunk of HTTP in their user-agent. I understand that uptake has been slow. That this is not popular. But I do not see support as optional. This practice, of picking & choosing what to implement of our core standards, of deciding to drop core features that were by concensus agreed upon- because 5 years have passed & we're not sure yet how to use it well yet- is something I bow my head to & just hope, hope we can keep on through.


> But I do not see support as optional.

That's funny, given the HTTP/2 RFC does see the support as optional.


Damn james, killing me.

Philosophically, I am highly opposed to the browser opting not do support this.

Webdevs have been waiting for half a decade for some way to use PUSH in a reactive manner, as I linked further in this thread,

And instead we get this absolute unit of a response.

This is just absolutely hogwash james. I can not. Truly epic tragedy that they would do this to the web, to promising technology, after so so so little time to try to work things out, after so little support from the browser to try to make this useful.

You're not wrong but I am super disappointed to see such a lukewarm take from someone I respect & expect a far more decent viewpoint from.


I think you're mistaking that person for someone else


Yeah I was mistaking him for James Snell, author of HTTP2 (and PUSH) in Node.js. :)


Is there an example of http/2 push being used usefully? I haven't seen one, but would be happy to take a look.

If it's not useful, why keep support for it?

As I recall, Google added this push in spdy, so it makes sense for them to be the ones to push for it to be removed.


Think of a framework like Laravel; you can make the Laravel app write "Link" headers for each of the JS or CSS assets declared in its views being rendered. Then, a server, say Caddy which supports push out of the box https://caddyserver.com/docs/caddyfile/directives/push will read the Link headers from the response from PHP, and push them on its behalf. Super simple to implement.

Adoption has been low because there hasn't been enough time for people to get comfortable with and switch to more modern web servers that support this type of thing, and most frameworks haven't figured out that having these sorts of features on by default could have major benefits. But it could happen.


It's good enough for the Web Push Protocol[1], which underpins all the notification systems on the web. This is literally how every notification message that you do not ever accept in to your browser gets delivered.

This also highlights the core missing WTF, which is that the server can PUSH a resource, but there is no way for the client page to tell. 5 years latter, we've talked & talked & talked about it[2], but no one has done a damned fucking thing. Useless fucking awful biased development. PUSH gets to be used by the high & mighty. But it's useless to regular web operators, because no one cares about making the feature usable.

Now that the fuckwads can declare it's not useful, they're deleteing it. From the regular web. But not from Web Push Notifications. Which will keep using PUSH. In a way that websites have never been afforded.

You'd think that if a resource got pushed, we could react to it. These !@#$@#$ have not allowed that to happen. They have denial of serviced the web, made this feature inflexible. Sad. So Sad.

Even without the reacting to push, it seems obvious that there's just more work to do. That we haven't made progress in 3 years just seems like infantile terrible perception of time, of what the adoption curve looks like. A decade for critical serious change to start to take root is fine. The expectation for progress & adoption is way out of whack.

So so sad. Everything is so wrong. You can't just unship the web like this. You need to support the HTTP features we agreed we wanted to create. None of this is happening. I hate what has happened to PUSH so bad. This is so critically terribly mismanaged, the browsers standards groups have fucked this up so so so bad, done so little, had such attrocious mismanagement of what the IETF wanted to do. It's embarassing. We are terrible, we have screwed the pooch so bad on this so many times, & pulling the plug is a colossal fuck up of unbelievable proportions, fucking up a very basic promise that we ought have made for the web, the ability to push things, that never got delivered in to any useful ability on the web to do anything about it. Fuck up, 1000000x fuck up, just fucking shit balls terrible work everyone.

This one issue causes severe doubt in humanity for me. This is a fucked up terrible thing for all humanity, for the web. I can't believe we were so terrible at this.

[1] https://developers.google.com/web/fundamentals/push-notifica...

[2] https://github.com/whatwg/fetch/issues/65


Who is "we" here? Have you worked on a browser?


This is odd, since I thought Akamai was working on LL-HLS, and the spec recommends the use of HTTP/2 server push for periodic playlist updates.


LL-HLS removed HTTP/2 Server Push from the spec back in January: https://mux.com/blog/low-latency-hls-part-2/

This was driven by poor server + client support at large, and the complexity it introduced. LL-HLS instead uses byte ranges and open-ended ranges over HTTP/2 - almost Comet-style - to handle CMAF chunks.


The post does say:

> It is interesting to note that server push has been used in ways other than originally intended. One prominent example is to stream data from the server to the client, which will be better served by the upcoming WebTransport protocol.


Well that sucks. Caddy has supported HTTP/2 push for a very long time (except for a few months after v2 was released since it was a total rewrite). https://caddyserver.com/docs/caddyfile/directives/push


I just finished 3 weeks implementing server push for my web app which cuts typical load times in half :-(.

I guess I'll have to go back to putting all the images base64 encoded into the html :-(


Depending on how big your pages are and how fast they are generated preload is probably a better alternative.

Instead of pushing the images just set headers for the images above the fold:

    Link: </img/1.jpg>; rel=preload; as=image
The browser will then request the images it doesn't have cached already.

The advantage to this method is that the browser can look up the images in its cache first and avoid transferring unnecessary data.

The downside is that it will take at least one round trip for the browser to request them. So if your HTML is short and quick to generate the connection might go idle before your receive these requests.


My render target is 200 milliseconds for the first page load for the 80th percentile, so an extra round trip kills that. External resources via a preload seem to come to easily an extra 200ms, seemingly due to threading delays in Chrome more than the network latency.


Actually you should use

    Link: </img/1.jpg>; rel=preload; as=image; nopush
Or it's likely that the web server will transform this hint in a Server Push (it's supported out of the box by Apache, NGINX, CloudFlare...).


"should" is a strong word. If you have a very short and fast HTML page you might want to start pushing the assets just in case the user doesn't have them cached. But yet, the choice is yours.

I wonder if the proxies would consider some sort of hybrid mode where they proactively fetch the link header and cache it (if it isn't already cached) but don't push it to the client. I can't find any indication that this has been implemented or considered anywhere. Of course if your edge was much closer to the user the the origin it wouldn't have much benefit. But it would still be nice.


Do web-transport obscolete push? https://w3c.github.io/webtransport/

I had read a technical comment once that told that HTTP 2 push was superior to websocket but couldn't remember why. Also what's the difference between push and server sent events?


Web transport isn’t used for transferring HTTP requests/responses by the browser. It’s essentially an expansion of WebSockets to include multiple streams and optional UDP-like delivery, while still being encapsulated in HTTP/3 and suitable to be called by JS. Server sent events similarly only works if you have JS on the client receiving it, the browser doesn’t know how to render the events inherently.


WebTransport is the kind of stuff that makes me lose faith in the capacity for humanity to consciously evolve.

It's like your great-great-great-grandparents built a house out of brick. Each new generation there's more people. Everyone wants to live in the same house, but they can't all fit. They try to build the house larger, but it will only work so high with brick. So they start shoving tin and iron and steel into each new floor to reinforce it. Eventually you have a skyscraper where the top floors are built out of titanium, and the bottom floor is several-hundred-years-old brick. But hey, we have a taller building!

You could say this is a perfect example of evolution, like big bright red baboon butts. But if the evolution were conscious, we'd want things to improve over time, not just make the same crap scale larger.


Or maybe, people want to do real-time networking in the browser without having to delve into the mess that is the all-singing, all-dancing WebRTC? Or they want battle tested protocol encryption libraries and flow control in their native app that can do unreliable and out of order delivery?

It’s great fun to write these metaphors about HTTP/N and friends, but the last one improved network performance for mobile devices very substantially while half of HN was jeering at it. There are impractical, camel-like, and downright bizarre IETF standards but these are bad examples.


> There are impractical, camel-like, and downright bizarre IETF standards but these are bad examples

I'd love to know which RFCs you'd call "camel-like"


I was referring to "a camel is a horse designed by committee" :P


I was asking which IETF standards you would describe by that phrase.


JWT is probably the one that stands out the most in my mind (I’ll also file it under “bizarre”).


WebRTC does not seem intrinsically more difficult to work with than WebTransport (as someone who works with WebRTC daily at super low levels and also has read about WebTransport and even watched talks about it at the standards committee and is now just annoyed as they didn't even have the decency to replace all of the WebRTC data channel use cases :/). WebRTC is actually a ridiculously easy stack: it is seriously just SCTP over DTLS (with a small bit of ICE, to maintain the connection, that isn't really required anyway). To the extent to which the code sucks--and that's frankly the core problem literally everyone has with it--it can be fixed pretty easily rather than tacking more BS on top, but somehow everyone wants to fix it with new specifications rather than improving the engineering (which I think is related to the problem the other comment brings up); even just some better build engineering would make it easier to compile (which is the #1 reason people seem to hate on WebRTC, not that it is actually *that difficult to compile... but it certainly isn't an invalid complaint).


> Do web-transport obscolete push?

No, it solves a completely different problem.

HTTP/2 Server Push is a mechanism where the server can send HTTP responses for resources it believes the client will need soon. (For example, if a client has just requested a web page with a bunch of images on it, the server could use Push to send those images without having to wait for the client to encounter the image tags and request the associated image resources.)


What if on link hover, some javascript code notifies the server and the server pushes the page? When the user clicks the link the page will have already been downloaded. Would that not be possible and useful?


“Notifying the server” is already done by browsers by just making a regular request. Prefetching links has been a thing for a while now. You don’t need HTTP/2 to do it.


This prefetching implementation is extremely simple fails gracefully. No complex browser APIs used to imitate a normal browsing experience. It all just works as it just goes through the caching system. I'd say this is nothing like existing prefetching implementations, which don't seem to be used much and no wonder.


This is a really weird system you have devised.

You are making a custom request to ask the server to push something to the browser.

Browser: POST /preload?url=/next-page Server: PUSH /next-page

Just cut out the middleman and make a regular request.

Browser: GET /next-page

Even better use browser preloading mechanisms so that the browser knows how to best prioritize them. In fact if you do it this way the browser can even start downloading subresources and prerendering the page.


From what i understand the push cache gets deleted the moment the http connection is closed, which to me makes it sound not the most suitable for this.

Maybe just adding a rel=preload link tag dynamically would be better (do link tags work dynamically? I have no idea). Or just fetching with normal ajax and use a service worker.


There are libraries [1] that achieve what you describe.

[1]: http://instantclick.io/ "InstantClick"


Yes, TurboLinks is another library that does this: https://github.com/turbolinks/turbolinks


Yes and I'd describe them as unpopular hacks. Http2 push is not a hack.


Actually... I've implemented both client and server side HTTP/2 push and I'd say... it's a hack and we should deprecate it and remove it from the spec.


So instead of pursuing the idea of server pushing your CSS so the top of your page renders fast, the best option is to inline CSS directly into the page but lose caching benefits between pages?

Crafting a fast website is going to be messy and difficult for a good while still.


A "fast website" is super-easy to create if you don't add dozens of megabytes of useless crap to each page.

Two decades ago: hardware was slower, bandwidth was far more constrained, and browsers didn't have so many features nor take up so much resources --- and yet the speed of page loading was often much better than it is today!

Indeed, most of the "problems" web developers complain about are self-inflicted.


Things weren't just slower, they were orders of magnitude slower. It's ridiculous how we've managed to make computers ludicrously fast and the developer response has been to keep adding garbage until they are slow again.


Same with disk sizes and game file sizes, it's common to see 60gb+ installs for AAA titles.


Games specifically primarily take up space due to art assets. A high fidelity modern AAA game full of thousands of 4k+ textures, super-high-poly models, detailed animations, and hours of high quality audio is going to take up a lot of space. There are certainly optimizations to be had, but there's no way a 60GB+ game today would ever fit on, say, a DVD, at the same level of fidelity while maintaining the same player experience.

Higher compression means longer load times, so there's incentive to not compress more than absolutely necessary, and the biggest games tend to have huge worlds, which means you can't hold the whole world in memory at once. So you have to stream it in.

It's a constant balancing act between your nominal hardware target, the space the game will take up, up-front load times, and the amount of stuff that can be in a scene before the hardware can't keep up and you get model/texture streaming pop-in, stuttering, or an otherwise degraded player experience.

Perhaps in the next few years we'll see games leveraging super-resolution AI to quickly produce usably high-res textures from lower-res installed ones in storage faster than a directly compressed equivalent could be.... or games will leverage the same to take what they already ship and make it even higher detail...


> but there's no way a 60GB+ game today would ever fit on, say, a DVD, at the same level of fidelity while maintaining the same player experience.

I mean, you could make a modern open-world game where all the textures are procedurally generated from set formulae, ala https://en.wikipedia.org/wiki/.kkrieger .

It might even load quickly, if the texture generation algorithms were themselves parallelized as compute shaders.

But it'd be a 100% different approach to an art pipeline, one where you can't just throw industry artists/designers at the problem of "adding content" to your world, but instead have to essentially hire mathematicians (sort of like Pixar does when they're building something new.) Costly!


I don't know if you're being ironic or not. We would obviously not have as many great games if only math nerds were allowed to design them.

In fact, games are one of the few areas where all those compute/storage resources in private PCs are mostly justified.


I’ve shipped almost pure HTML landing pages (Single kBs of total JS, none of it render blocking) in the last few years, and inline CSS and preload headers help a lot, especially for older devices. Expectations for image/video resolutions in particular have gone up since “the good old days”. You can really see the effect with much older devices, which become the majority once you look outside the US and EU.

IME most people trying to optimize their way out of tag manager, monolithic SPA hell don’t generally bother with these kind of features outside of turning on all the cloudflare rewriting and praying. If performance was important to them and they knew what they were doing, they’d fix those first.


This is like when people complain about legacy code they first see it without understanding the context the code was written in. Or saying C coding is safe if everyone was just more careful and more skilled.

It's just not true that it's super easy to write fast pages. There's a huge amount of background you need to understand to optimize fonts, images, CSS, your server, your CMS, caching, scripts etc. There's multiple approaches for everything with different tradeoffs.

Even if you have the skills to do this solo, you might not have the budget or the time or the approval of management. Big websites also require you to collaborate with different developers with different skillsets with different goals, and you need to keep people in sales, graphic design, SEO, analytics, security etc. roles happy too.


Typical page loads were much slower two decades ago than they are today by my recollection, unless you were lucky enough to have something better than 56k dialup.


Server push arguably doesn’t bring caching benefits if you have to push the same CSS file as part of every request for any page. With browsers already able to eagerly parse <link preload> out of the head and HTTP/2 multiplexing concurrent connections you’re looking at saving a single round-trip time, once and adding bandwidth overhead to every other request for a page.


There's no solution to fix this problem with push though by changing how push works? A single round trip is significant on slow mobile connections.


No, it's not possible. It will take a round trip for the client to tell the server if an object is cached.

So you have two options:

1. Pessimistically start pushing the response, and stop if the client tells you it is already cached

2. Optimistically don't push the response until the client tells you that it doesn't have it cached.

The first option is what push does.

The second option is basically equivalent to sending a `Link: <..> rel=preload` header.

So I guess one way to look at is that we had both choices available. But it turns out that Push wasn't used much and probably isn't the best option.


> It will take a round trip for the client to tell the server if an object is cached.

Why wouldn't the client be able to give the server some information about what it has cached already when it makes the initial GET request for the HTML page?


"It is not possible" is an overly strong claim. But the options I can think of are not great.

Uploading an entire list of cached resources probably isn't feasible (and probably not possible or a privacy concern).

You could maybe do a bloom filter but probably still have privacy concerns.

There is a solution that mostly works. You can use cookies! A simple solution might be setting a cookie for the app version when you respond. On the next request you can avoid pushing resources that haven't changed between the client's last version and the current one. Or you can even try to set cookies for each asset, however I'm not sure that is a good strategy.

But this is still not perfect. The presence of the cookie doesn't mean the resource was ever downloaded (maybe it was aborted partway though?) and things can get purged from the cache and the cookie has no way of knowing.

You can also try to do out-of-band info. For example storing in a database which resources the client has downloaded before and a bunch of heuristics to determine if they are likely still in the cache. But at the end of the day only the client knows that.

So the TL;DR is I'm not sure this is possible. You are looking for `required resources - cached resources` and since `cached resources` is sensitive and probably much larger it is way easier to do this computation on the client.

So TL;DR it might be possible but is very tricky. The current state-of-the art isn't anywhere near a perfect solution.


I would think the best option would still be to have the css in a separate file unless your css is very small. It is of course a tradeoff between if the extra RTT hurts you more than the extra bytes in your page (how much that matters depends on how big your page is and tcp window sizes), and how often your users are viewing your website totally uncached.

Keep in mind, that even without server push http/2 still fixes head of line blocking which was a major reason that the folk wisdom of inlining things and using sprites popped up in the first place.


Well the best thing is to in-line only the CSS needed for content actually in the server render if it’s much smaller than total site CSS. If you’re code splitting your CSS in the first place this is usually handled pretty well by the same mechanism. Then CSS is still cacheable across page loads for later navigation (particularly if you use a ServiceWorker in lieu of server rendering once enough core assets are cached).


Yes but inlining the critical CSS only for every page isn't trivial, and usually involves complicating your build chain.

I'm not saying there's a perfect solution, it's just interesting they're giving up on push.


Or use <link rel="preload"> I guess (bonus: local caching still works).


You then have to wait for the preload content to arrive before your page starts to display though.


What? Multiplexing HTTP and other traffic was the entire argument justifying HTTP/2 and 3 complexity with multistatus etc. That server push was never going to work was clear from even a cursory look at the protocol spec and a minimal use case involving two subsequent page loads with shared resources. Was it really necessary for Google to foobar HTTP?


I'm going to sound like a broken record but HTTP/1.1 comet-stream works fine, just use "transfer-encoding: chunked" header in your response, write the length in hex followed by \r\n and then write the content with trailing \r\n\r\n, rinse and repeat.

It's simple, debuggable, inherently avoids cache-misses, scales (if you use non-blocking IO and joint concurrent capable language with OS threads).

It also avoids HTTP/TCP head-of-line because you're using a separate socket for your pushes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: