Http2 explained

geographomics · on Feb 13, 2015

At the risk of not being quite critical enough for HN - that was an excellent read.

Clear explanations, a very pleasant layout, and useful visual metaphors for the trickier parts of the spec. I found it to be both enjoyable and informative.

A really nice example of documentation done well.

bagder · on Feb 13, 2015

As the author of the document - thanks!

nly · on Feb 12, 2015

> 8.4.6. “It has layering violations”

> Seriously, that's your argument? Layers are not holy untouchable pillars of a global religion

If layers aren't 'untouchable pillars', then why have we not fixed the ones we have? IPSec, IP, TCP and TLS are all a jumbled rotten mess. Poor layering has resulted in a lot of warts like inefficient or underleveraged handshakes and the lack of things like mobility, multi-homing, authentication, reliable datagrams and stream multiplexing. What is really being said here is yes, the layers we have (TCP, NAT) really are untouchable pillars.

Cramming workarounds in to a higher, application-specific, layers doesn't benefit the wider Internet.

necubi · on Feb 12, 2015

Good luck getting everybody to upgrade their kernel to support your new transport protocol. Realistically, UDP and TCP are what we have. We may wish they were more suited to modern use cases, but realistically we must build on those foundations. If that means violating "layering" for performance, so be it.

serge2k · on Feb 13, 2015

Maybe instead of the "good luck" attitude we should start pushing an "upgrade or suffer" attitude.

Seems far more reasonable than letting things stagnate for years. It's what chrome is doing with sha1 certs.

Give a timeframe, if you don't get your upgrade in then too bad.

mobiplayer · on Feb 13, 2015

Google can push that because there's a huge user base with Chrome and you don't want your shiny ecommerce site to be marked as not safe by Chrome, do you?

On the other hand, the packets moved through the wires to send this comment to HN and the ones moved to send this comment to your computer are easily managed by dozens of different people and a handful of different companies with different agendas, budgets, needs and even skills. Cisco definitely manufactured and sold most of the devices out there, but they surely don't manage them or decide when they're upgraded. It gets worse, because actually there's not only Cisco out there.

I'm not saying it can't be done, but just look at the slow IPv6 adoption, despite the efforts of all the big players (Google, Cisco, Juniper, Microsoft, Linux, ... !) supporting it in a timely manner, it's still not there.

gsnedders · on Feb 13, 2015

Also remember that everything Google has ever pushed through has been above UDP/TCP, because it's far easier to push through changes on top of that than not. There's a reason why QUIC is being developed on top of UDP instead of as a transport layer in its own right.

billyhoffman · on Feb 13, 2015

> Maybe instead of the "good luck" attitude we should start pushing an "upgrade or suffer" attitude.

"Upgrade or suffer" rarely works in practice. Rendering (or rather the lack of rendering) malformed XHTML is a great example of this

testrun · on Feb 13, 2015

I think he/she is not talking about the transport protocol (TCP/UDP), but the higher up layers (5 - 7) cramming functionality from the lower layers. Might be reading it the wrong way.

detaro · on Feb 13, 2015

They only try to cram functionality from lower levels into higher ones because establishing new lower ones is nearly impossible. SCTP (or variants of it) would implement a lot that is now build on higher levels, but there is a reason it's only used in private networks.

Chlorus · on Feb 13, 2015

No kidding, take a look at the presentation[0] for multipath TCP - and that was a relatively 'simple' transport layer modification. Look at slide #30 for a glimpse into the madness caused by crappy middleboxes.

[0]http://multipath-tcp.org/data/MultipathTCP-netsys.pdf

xorcist · on Feb 13, 2015

The alternative is to layer something over HTTP. That's may be even worse. There are no simple ways to upgrade global infrastructure.

patrickmcmanus · on Feb 12, 2015

otoh - http on sctp on ipv6 with ipsec hasn't really moved the internet forward.

jerf · on Feb 12, 2015

"One of the drawbacks with HTTP 1.1 is that when a HTTP message has sent off with a Content-Length of a certain size, you can't easily just stop it. Sure you can often (but not always – I'll skip the lengthy reasoning of exactly why here) disconnect the TCP connection"

Can someone give me more of a hint of that reasoning so that I can at least search for it? I'm intrigued, but the search terms I'm trying all come back with explanations of why you might need Content-Length, a different issue.

Lukasa · on Feb 12, 2015

Sure. The only way to stop a message before the content-length is transmitted is to kill your TCP connection. This is inefficient: you need to recreate it, bearing all the TCP set-up cost all over again, and then deal with the small initial congestion window on your new connection.

HTTP/2 allows you to avoid that by saying "I'm done with this stream now, sorry!"

ashmud · on Feb 12, 2015

How do web browsers handle cancelled requests? Subjectively speaking, it feels like browsers can take a while to recover from cancelling the loading of a large page/page with a large number of assets. Could this be part of that?

patrickmcmanus · on Feb 12, 2015

you're right. Cancel's in H1 are very painful because all the in-progress transactions have to be torn down completely. New transactions have to set them all up again.

H2 let's you just send the server a short message that says "stop sending that stream" and you can go ahead and pipeline a new request right along with that cancel.

This happens a lot more than you think as you browse through a collection of things and are just scanning them and clicking the next button - that's a really common use case h2 will handle much better.

Lukasa · on Feb 12, 2015

Ah, sorry, I misread your question! I don't actually know the lengthy reasoning for that edge case. =(

jerf · on Feb 12, 2015

No problem, I see I was not clear. Yes, I am specifically curious about the edge case when you "can't" just disconnect the TCP. Scare quoted because I'm sure you can, it just involves something undesirable happening. I have vague ideas and guesses, but I'm curious about what the intent was.

pluma · on Feb 13, 2015

Basically, in HTTP 1.x there are only very specific times at which the server can respond throughout a request. In particular, there's no way to tell a client "please stop" when it is transmitting the request body.

So how most web applications handle request bodies that are too big is to either kill the connection (which to the client looks like a network issue) or patiently pipe the entire message to /dev/null before sending a response that indicates the message has not been processed (e.g. HTTP 413).

Unless I'm mistaken, this can be avoided if the client sends an honest Content-Length header, but this only works if the size is known in advance and the client is honest (you could submit an arbitrary number of bytes with a Content-Length header indicating something much smaller).

Because there is no clear way to distinguish a server aborting a request mid-body because of the size versus spontaneous existence failure, a client might misinterpret that and attempt to re-submit the same request later.

iso8859-1 · on Feb 13, 2015

Most requests are GET, how would Content-Length help there?

stusmall · on Feb 12, 2015

"Some of the bigger players in the HTTP field have been missing from the working group discussions and meetings. I don't want to mention any particular company or product names here, but clearly some actors on the Internet today seem to be confident that IETF will do good without these companies being involved..."

I haven't been following this much, who is he referring to?

bch · on Feb 12, 2015

> "clearly some actors on the Internet today seem to be confident that IETF will do good without these companies being involved"

That's impossible to believe. Now I wonder -- what's the _real_ reason ?

mongol · on Feb 12, 2015

Probably Apple, there is another mention of them later on in the document.

bsdetector · on Feb 12, 2015

Could be. Mobile Safari uses pipelining, so for Apple there's not a lot of benefit from HTTP2. It isn't a big enough deal for them to push a new protocol, like it is to say Google that doesn't have pipelining in their browser.

bagder · on Feb 13, 2015

Sorry, but that doesn't make a lot of sense. Pipelining is far from a replacement for HTTP/2 as the document explains a bit. Besides, Safari already supports SPDY because of this.

billyhoffman · on Feb 13, 2015

Apple has already committed to (and rolled out support for) SPDY. If you have iOS8, your Mobile Safari can use SPDY. Presumablely they will shift to HTTP/2 as well.

http://zoompf.com/blog/2014/06/spdy-coming-to-safari-future-...

otterley · on Feb 12, 2015

I'm still not entirely sure why the problems inherent in single-stream connections (request/response) couldn't be solved simply by removing the artificial RFC-recommended limitation on parallel connections to a server. As the author says, providers have been escaping this limitation for years by simply adding hostname aliases, but has nothing negative to say about it.

Modern HTTP servers are highly concurrent; allowing 100 connections per request doesn't seem like a problem nowadays. And doing so would solve 99% of the browser performance problem without introducing a significantly more complicated multiplexing protocol.

Lukasa · on Feb 12, 2015

Because running multiple TCP connections in parallel plays havoc with TCP congestion control and also plays poorly with the TCP slow-start logic. Every TCP connection begins its receive window again and so it starts small, so fetching many moderately-sized or large resources (think images) will cost you many round trips you didn't need to spend.

bwross · on Feb 12, 2015

Additionally, a TCP connection is essentially an operating system resource; you need to set aside a port and space for a send and receive buffer. It might seem fine for a client to open hundreds connections, but imagine being a server with thousands of clients all opening hundreds of connections to you. You very quickly run out of resources and either have to close connections or reject new connections.

otterley · on Feb 13, 2015

That hasn't been a practical problem in many years. Most servers have gigabytes of RAM and a 64-bit kernel nowadays.

getsat · on Feb 13, 2015

Linux starts to act weird around 200,000 concurrent connections in my experience, even with aggressive sysctl tuning. You end up with weird edge cases like netstat literally taking 15 minutes of CPU time (in kernel) before it dumps the list of connections to stdout.

Not sure about FreeBSD or any other OSes.

btmorex · on Feb 13, 2015

Not saying netstat isn't slow, but:

# time sh -c 'netstat -tn | wc -l' 486206

real 0m13.538s user 0m1.698s sys 0m10.380s

It still works with a whole lost of connections. (in fairness, only about 130k were connected)

getsat · on Feb 14, 2015

The problem is that it doesn't scale linearly. There's some O(n4) algorithm being used in netstat or some kernel syscalls or something. Once you go over 250k, things get _really_ weird.

patrickmcmanus · on Feb 15, 2015

try ss -nt instead.. it uses netlink sockets instead of /proc and generally scales much better

getsat · on Feb 17, 2015

Thanks for the tip!

bsdetector · on Feb 12, 2015

If you start out making 6 TCP connections then in a perfect network their receive windows all expand in parallel -- 6 times faster than a single HTTP2 connection. You're likely to see several HTTP2 connection made as well. There'll at least be a second one in case the first SYN is lost, just like with HTTP 1.

The key TCP benefit is keeping a connection open. That can be done with keep-alive as well.

jude- · on Feb 13, 2015

Moreover, it's bad from a QoS standpoint. Opening many TCP streams in parallel is not fair to other users sharing the routers between you and the server. You'd get more than your fair share of bandwidth.

otterley · on Feb 13, 2015

Only if you don't assume that everyone else will also use the same number of streams in parallel. If everyone else's utilization is increased by the same factor, the utilization balance should remain the same.

jude- · on Feb 13, 2015

The goodput decreases for everyone then, since each flow requires a 3-RTT handshake and 1-2-RTT tear-down. This is particularly bad if you're starting up a lot of small flows, where the control-plane information becomes a non-negligible fraction of the total data sent.

I think ideally, we'd create a TCP variant where localhost maintains a per-destination receiving window for all flows to that destination, so flows running in parallel or flows started in rapid succession won't have to start their windows at 0 and slowly increase them. Moreover, this way congestion control applies to all packet flows for a (source, destination) pair, instead of to individual flows.

HTTP/2 and HTTP pipelining take a crack at this by running multiple application-level flows (i.e. HTTP requests) through the same receiving window (i.e. the same TCP socket), but they're not the only application-level protocols that could stand to benefit.

viraptor · on Feb 13, 2015

Doesn't that depend entirely on the queuing and session mapping algorithm? If you profile for bandwidth-per-source-ip, shouldn't that apply exactly the same restriction regardless of how many connections are started? (with multiple-connections people losing a bit because connection start/stop takes bandwidth they could use for data instead)

jude- · on Feb 13, 2015

Yes and no. Yes in theory, using bandwidth-per-source-IP caps like you suggest could be used to solve this. No in practice, because the prevalence of NATs puts many users behind the same IP address, meaning that unfair users can still hog the upstream bandwidth from everyone else in the same NAT.

What we really want is something more like bandwidth-per-end-host caps.

otterley · on Feb 13, 2015

I don't buy the congestion-control argument. Transit routers have millions of connections going through them at any given point. What's an extra 10x in concurrency to them?

patrickmcmanus · on Feb 13, 2015

Its a fair question but the h2 approach is doing the right thing on balance. The issue isn't really computational or memory requirements.

The startup phase of a TCP stream is essentially not governed by congestion control feedback because there hasn't been enough (or maybe any) feedback yet. It is initially controlled by a constant (IW) and then slowly feels its way before dynamically finding the right rate. IW can generally range from 2 to 10 segments. Whether this is too much or too little for any individual circumstance is somewhat immaterial - its generally going to be wrong just because that's the essence of the probing phase - you start with a guess and go from there.

Each stream has a beginning a middle and an end. 1 large stream has 1 beginning 1 (large) middle and 1 end, but N small streams have N beginnings N (smaller) middles, and N ends. The amount of data in the beginning is not a factor of the stream size (other than it being the max), it is rather governed by the latency and bandwidth of the network. So more streams means more data gets carried in the startup phase. If the beginning is known to be a poor performing stage (and for TCP it is) then creating more of them and having them cover more of the data is a bad strategy.

In practice, IW is too small for the median stream - but there is a wide distribution of "right sizes" so its ridiculously hard to get right.. maybe it is IW=10 and the right size is 30 segments; but that's one stream 3x too small- it isn't 20x or 50x too small, so when you open 50 parallel tcp connections you are effectively sending at IW * 50. And that does indeed cause congestion and packet loss.. and its not the kind of "I dropped 1 packet from a run of 25 please use fast-retransmit or SACK to fix it for me" packet loss we like to see.. its more of the "that was a train wreck I need slow timers on the order of hundreds of milliseconds to try again for me" packet loss that brings tears to my eyes. One of the reasons for this goes back to the N beginnings problem - if you lose a SYN or your last data packet the recovery process is inherently much slower, and N streams have N times more SYNS and "last" packets than 1 stream does. Oh, and 50 isn't an exaggeration. HTTP routinely wants to generate a burst of 100 simultaneous requests these days (which is why header compression when doing multiplexing is critical - but that's another post).

So the busier larger flow both induces less loss and is more responsive when it experiences loss. That's a win.

And after all that you still have the priority problem. 50 uncoordinated streams all entering the network at slightly different times with slightly different amounts of data will be extremely chaotic wrt which data gets transferred first. And "first" matters a lot to the web - things like js/css/fonts all block you from using the web, but images might not.. and even within those images some are more important than others (some might not even turn out to be on the screen at first - oh wait, you just scrolled I need to change those priorities). Providing a coordination mechanism for that is one of the real pieces of untapped potential hiding in h2's approach to mux everything together.

There is a downside. If you have a single non-induced loss (i.e. due to some other data source) and it impacts the early part of the 1 single tcp connection then it impacts all the other virtual streams because of tcp's in-order delivery properties. If they were split into N tcp connections then only one of them would be impacted. This is a much discussed property in the networking community, and I've seen it in the wild - but nobody has demonstrated that it is a significant operational problem.

The h2 arrangement is the right thing to do in a straightforward TLS/HTTP1-COMPATIBLE-SEMANTIC/TCP.. making further improvements will involve breaking out of that traditional tcp box a bit (quic is an example, minion is also related, even mosh is related) and is appropriately separated as next-stage work that wasn't part of h2. Its considerably more experimental.

patrickmcmanus · on Feb 12, 2015

The first answer is that parallelism without priority (which is what parallel h1 is) can lead to some really horrible outcomes. Critical pieces of the page get totally shoved out of the way while bulky, but less important parts, get the bandwidth they need.That's why H2 and SPDY are both mux'd and prioritized.

also you can definitely over-shard with h1.. as said downthread that can cause congestion problems and indeed even packet loss. For a little while pinterest had gigantic packet loss problems that were due to over sharding of images.

The really annoying thing is that the "right amount of sharding" has to do with available bandwidth, the size of the resources being sent, and the latency between client and server.. Those things aren't really knowable on a generic per-origin basis when setting your links up - so the spdy/h2 approach works better in practice.

if I had a criticism here, its that implementing priority right is a lot trickier than just have a bunch of independent connections. We will probably see some bad implementations in the early days until folks internalize how important it is.

youngtaff · on Feb 12, 2015

Will Chan of Chromium wrote a good post explaining the congestion issues with opening many TCP connections - https://insouciant.org/tech/network-congestion-and-web-brows...

otterley · on Feb 13, 2015

His test showed what happens on a slow network - too much parallelism leads to retransmits and congestion on a slow network. But it doesn't show ill effects on a fast network with plenty of bandwidth.

The question is how to get it right, and the problem is that you can't get it right without knowing the amount of bandwidth available to you in advance. Limiting concurrency limits congestion on slow networks, but it caps you unnecessarily on fast ones. The same is true for SPDY/http2; using a single stream will never give you the same concurrency as multiple streams.

ovi256 · on Feb 13, 2015

Retransmits and congestion are not bad by nature - they're a symptom that the network is being heavily used.

Edit: never mind, misread the test, his tools clearly show goodput (good, desirable throughput) going down in congestion. Lowering initcwnd (as Chrome 29 did) eliminates this on slower connections, improving user experience. I would like to see page load time though, as a proxy for time to screen. It's intriguing that the 6s total page load time did not seem to change.

patrickmcmanus · on Feb 13, 2015

I wanted to support the big picture idea of your comment about packet loss not being inherently bad. Too many strategies are based on the principle that every packet is precious rather than total system goodput. Indeed TCP congestion control is really premised on loss happening - it keeps inching up the sending rate until a loss is induced and then it backs off a bit.

OTOH, TCP really performs poorly in the face of significant levels of loss. So high levels of loss specifically in HTTP really are a bad sign, at least as currently constructed.

Also worth being conerned with: losses that occur late in the path waste a lot of resources getting to that point that could instead be used by other streams sharing only part of the path. (e.g. If a stream from NYC to LAX experiences losses in SFO it is wasting bandwidth that could be used on someone else's PHI to Denver stream). A packet switched network has to be sensitive to total system goodput, not just that of one stream.

mobiplayer · on Feb 13, 2015

Yes, you're right and that's why parallelism is restricted: So slow connections can also be part of the Internet.

jtokoph · on Feb 12, 2015

Does anyone know how WebSockets fit into the http2 world? Will we just end up using http2 server push and the rest of the protocol as a substitute for WebSockets?

derefr · on Feb 13, 2015

WebSockets were always intended for only one specific thing—allowing web browsers and web servers to speak connection-oriented, stateful wire protocols (like IRC or IMAP) at one-another over an HTTP tunnel.

Any other usage than this has been merely a polyfill for lack of efficiently-multiplexed or easily-server-initiated messaging.

Given an efficiently-multiplexed, bidirectional-async messaging channel in the form of HTTP2, WebSockets can fall back to just being for what they're for, and we can relegate their polyfill usage to the same place Comet "async forever iframes" have gone.

Mojah · on Feb 13, 2015

If anyone is looking for a practical guide to using HTTP/2, what changes it will bring compared to HTTP/1.1 in terms of architecting websites and web applications, I've written a guide on that: http://ma.ttias.be/architecting-websites-http2-era/

djhworld · on Feb 13, 2015

Is there a "one page" HTML version of this? I tend to read stuff on my phone via apps like Instapaper etc, PDFS don't really suit mobile devices :(

gremlinsinc · on Feb 19, 2015

Wow - was expecting something scary, super technical and over my head - very easy overview - not that I'm not technical I do server support for a huge web hosting firm - but sometimes http docs get way too tech spec --this was very clear to understand.

muppetman · on Feb 13, 2015

I find it amusing that the document that explains HTTP2 is a PDF I have to download. There's no HTML version.

(Yes, I'm aware that HTTP != HTML.)

teddyh · on Feb 13, 2015

Still not even a mention of SRV records in the “critiques” section. I’m more saddened than surprised, really.

gaastonsr · on Feb 13, 2015

Beautifully written, thanks for this.

jude- · on Feb 12, 2015

> 8.4.4. “Not being ASCII is a deal-breaker”

> Yes, we like being able to see protocols in the clear since it makes debugging and tracing easier. But text based protocols are also more error prone and open up for much more parsing and parsing problems.

> If you really can't take a binary protocol, then you couldn't handle TLS and compression in HTTP 1.x either and its been there and used for a very long time.

First, you can have the best of both worlds of fixed-sized frames and human readability: make sure each HTTP keyword has a finite, short length. ASCII abbreviations are an acceptable means to this end. This would also eliminate a lot of the implementation difficulties and performance penalties of writing and using a parser.

Second, TLS and compression are not integrated into HTTP/1.1, meaning that people who want to be able to read an HTTP stream on the wire can do so by disabling these features. It's disingenuous to claim that people don't care about human readability just because these extensions exist.

derefr · on Feb 13, 2015

Why not just have the wire sniffer decode the frames before presenting them. If you're interested in what's going on at the HTTP level, you aren't reading packetwise IP packet dumps, because it's hard to make sense of anything and everything is all mixed together; you're looking at abstracted, higher-level flows, where it's just taken for granted that you have a set of linear TCP streams.

HTTP2 is a(n SCTPish) transport-layer protocol squished in underneath an application-layer protocol. Use tools that abstract away the transport-layer protocol.

Or, just, y'know, disable HTTP2? It's an "optional feature" as much as TLS and compression are. Everything that speaks HTTP2 also speaks HTTP1.1, just like everything that speaks compressed/encrypted HTTP also speaks uncompressed/unencrypted HTTP.

pdkl95 · on Feb 13, 2015

It is never a good idea to assume that a can opener[1] will always be available. While your decoder - an extra tool - may be available in "many" cases, there will always be a sizable minority of cases where it isn't. As the performance benefits of fixed-width fields are orthogonal to the fields being ASCII-vs-binary.

> SCTP / "transport-layer protocol"

So we move yet another step down the path of obsoleting TCP port numbers by adding another layer of indirection[2]. Re-implementing ports by tunneling everything over HTTP{,2} was a bad idea when it started over a decade ago, and it's still the wrong way to solve the problem.

[1] http://en.wikipedia.org/wiki/Assume_a_can_opener

[2] See RFC 1925, Section 2, rule 11a. ( https://tools.ietf.org/html/rfc1925 )

derefr · on Feb 13, 2015

The real problem is that flow control needs to happen at the pair-of-machines level. If we had that, opening 1000 "stream" connections to a server would be exactly the same as opening a single 1000-channel SCTP connection.

jude- · on Feb 13, 2015

> Why not just have the wire sniffer decode the frames before presenting them.

Certainly possible, but why make our lives harder by implementing HTTP/2 such that it requires a decoder to read in the first place? If the number of bytes sent remains the same, why not make the fields as self-documenting as possible?

> Everything that speaks HTTP2 also speaks HTTP1.1

No, they have fundamentally different wire formats. This statement isn't even true for HTTP/1.1 and HTTP/1.0, which both have the same wire formats and share many fields but have different interpretations for some of them.

MichaelGG · on Feb 13, 2015

>Certainly possible, but why make our lives harder

I cannot imagine anyone that has written a compliant HTTP parser, or attempted to make a fast HTTP implementation thinking the new framing is harder.

As the article mentions, yeah, it would be nice to be able to look through raw captures. But overall, it's simply too much of a massive downside. It wastes space and burns CPU for nearly zero benefit.

Text protocols make developers start treating them like text than protocols, so you end up with a nightmare of things that look ok to humans but introduce compatibility or security issues when parsing. Even getting line endings right is a pain.

derefr · on Feb 13, 2015

I don't mean that everything that speaks HTTP2 speaks HTTP1.1 definitionally; I mean that, literally, every web browser and web server currently coded to speak HTTP2 is also coded, by duplication of effort, to speak HTTP1.1, and nothing offers an "HTTP2-only" mode of operation so far (nor, I doubt, will anything any time soon.)