I don't buy the congestion-control argument. Transit routers have millions of co...

patrickmcmanus · on Feb 13, 2015

Its a fair question but the h2 approach is doing the right thing on balance. The issue isn't really computational or memory requirements.

The startup phase of a TCP stream is essentially not governed by congestion control feedback because there hasn't been enough (or maybe any) feedback yet. It is initially controlled by a constant (IW) and then slowly feels its way before dynamically finding the right rate. IW can generally range from 2 to 10 segments. Whether this is too much or too little for any individual circumstance is somewhat immaterial - its generally going to be wrong just because that's the essence of the probing phase - you start with a guess and go from there.

Each stream has a beginning a middle and an end. 1 large stream has 1 beginning 1 (large) middle and 1 end, but N small streams have N beginnings N (smaller) middles, and N ends. The amount of data in the beginning is not a factor of the stream size (other than it being the max), it is rather governed by the latency and bandwidth of the network. So more streams means more data gets carried in the startup phase. If the beginning is known to be a poor performing stage (and for TCP it is) then creating more of them and having them cover more of the data is a bad strategy.

In practice, IW is too small for the median stream - but there is a wide distribution of "right sizes" so its ridiculously hard to get right.. maybe it is IW=10 and the right size is 30 segments; but that's one stream 3x too small- it isn't 20x or 50x too small, so when you open 50 parallel tcp connections you are effectively sending at IW * 50. And that does indeed cause congestion and packet loss.. and its not the kind of "I dropped 1 packet from a run of 25 please use fast-retransmit or SACK to fix it for me" packet loss we like to see.. its more of the "that was a train wreck I need slow timers on the order of hundreds of milliseconds to try again for me" packet loss that brings tears to my eyes. One of the reasons for this goes back to the N beginnings problem - if you lose a SYN or your last data packet the recovery process is inherently much slower, and N streams have N times more SYNS and "last" packets than 1 stream does. Oh, and 50 isn't an exaggeration. HTTP routinely wants to generate a burst of 100 simultaneous requests these days (which is why header compression when doing multiplexing is critical - but that's another post).

So the busier larger flow both induces less loss and is more responsive when it experiences loss. That's a win.

And after all that you still have the priority problem. 50 uncoordinated streams all entering the network at slightly different times with slightly different amounts of data will be extremely chaotic wrt which data gets transferred first. And "first" matters a lot to the web - things like js/css/fonts all block you from using the web, but images might not.. and even within those images some are more important than others (some might not even turn out to be on the screen at first - oh wait, you just scrolled I need to change those priorities). Providing a coordination mechanism for that is one of the real pieces of untapped potential hiding in h2's approach to mux everything together.

There is a downside. If you have a single non-induced loss (i.e. due to some other data source) and it impacts the early part of the 1 single tcp connection then it impacts all the other virtual streams because of tcp's in-order delivery properties. If they were split into N tcp connections then only one of them would be impacted. This is a much discussed property in the networking community, and I've seen it in the wild - but nobody has demonstrated that it is a significant operational problem.

The h2 arrangement is the right thing to do in a straightforward TLS/HTTP1-COMPATIBLE-SEMANTIC/TCP.. making further improvements will involve breaking out of that traditional tcp box a bit (quic is an example, minion is also related, even mosh is related) and is appropriately separated as next-stage work that wasn't part of h2. Its considerably more experimental.