You can still deconstruct the message and it is simple and easier to debug even ...

MichaelGG · on Feb 10, 2015

The "classic rule" aka Postel's Law, has proven to be disastrous. The idea of resuming a corrupted message is a totally flawed concept. At best, it introduces compatibility issues. This is essentially the history of HTML and browsers, each one needing to implement the same bugs as other popular versions.

SIP is another IETF gem, which takes its syntax from HTTP. And guess what? It's impossible to have unambiguous parsing in the wild! Why? The whole liberal in what you accept bad idea. So A interprets \n as a line ending, even though the spec says \r\n. B interprets it another liberal way, and assumes you didn't mean to transmit two newlines in a row, so it'll keep reading headers. End result: you can bypass policy restrictions by abusing this liberal-ness and get A to approve a message that B will interpret in another way. Yikes. And, since the software for both is so widely deployed, there is little hope of solving the problem. In fact, the IETF essentially requires you to implement AI as you're supposed to guess at the "intent" of a message.

So you're sorta proving my point, that people are thinking "oh it's just text" and then writing shitty, sloppy code, and they're giddy cause it sorta worked, even from a two line shell script. And then further generations have to deal with this mess, because these folks just can't bear to get line endings right or whatnot.

drawkbox · on Feb 10, 2015

Keep in mind you are still going to have lots of these same problems you mention inside the binary blocks and header blocks. Just the specific annoyances of HTTP 1.1 will be gone but new ones will appear.

Going binary does not make it suddenly easier, it just slices it up and adds a layer of obfuscation.

Easier to know what the hell is going on across a wire with current formats and debug them. Utopia interop does not exist so Postel's Law has gotten us this far. Being text no doubt makes it easier to debug and interoperate, otherwise we'd be sending binary blocks instead of json. Unless you control both endpoints, Postel's comes into play and simplicity wins.

We are moving in a new direction for better or worse and going live. I feel like it is slightly off the right path but sometimes you need to take a wrong step like SOAP did to get back to simple. We'll see how it goes.

agentS · on Feb 10, 2015

A binary protocol's parsing is usually something like read 2 bytes from the wire, decode N = uint16(b[0] << 8) | uint16(b[1]), then read N bytes from the wire. A text-based protocol's parsing almost always involves a streaming parser, which is tricky to get correct, and always more inefficient.

Besides, I think this is a moot point, because chances are that less than 100 people's HTTP2 implementations will serve 99.9999% of traffic. It's not like you or I spend much of our time deep in nginx's code debugging some HTTP parsing; I think its just as unlikely we'll be doing that for HTTP2 parsing.

Also, HTTP2 will always (pretty much) be wrapped in TLS. So its not like you're going to be looking at a plain-text dump of that. You'll be using a tool and that tool author will implement a way to convert the binary framing to human-readable text.

Another way to put it is that the vast majority of HTTP streams are not examined by humans and only examined by computers. Choosing a text-based protocol just seems a way to adversely impact the performance of every single user's web-browsing.

Another another way to put it is that there is a reason that Thrift, Protocol Buffers, and other common RPC mechanisms do not use a text-based protocol. Nor do IP, TCP, or UDP, for that matter. And there's a reason that Memcached was updated with a binary protocol even though it had a perfectly serviceable text-based protocol.

drawkbox · on Feb 10, 2015

Agreed on all points. Binary protocols are no doubt better, faster, more efficient and more precise. I use reliable UDP all the time in game server/clients. Multiplayer games have to be efficient, TCP is even too slow for real-time gaming.

Binary protocols work wonderfully... when you control both endpoints, the client and the server.

When you don't control both endpoints is where interoperability breaks down. Efficiency and exactness can be enemies of interoperability at times, we currently use very forgiving systems instead of throwing them out and assert crash dump upon communication error. Network data is a form of communication.

Maybe you are right, since it is binary, only a few hundred implementations might be made and those will be made by better engineers since it is more complex. Maybe HTTP is really a lower level protocol like TCP/UDP etc now. Maybe since Google controls Chrome and the browser lead and has enough engineers to ensure all leading implementations and OSs/server libraries/webservers are correct then it may work out.

As engineers we want things to be exact, but there are always game bugs not found in testing and hidden new problems that we aren't weighing against the known current ones. Getting a something new is nice because all the old problems are gone, but there will be new problems!

It will be an all new experiment we try going away from text/MIME based to something more lower level, complex and exact over simple and interoperability focused. Let's see if the customers find any bugs in the release.

MichaelGG · on Feb 10, 2015

>Binary protocols work wonderfully... when you control both endpoints, the client and the server.

IP is all binary and I don't think it's a case of one party controlling all endpoints.

MichaelGG · on Feb 10, 2015

Binary protocols are usually far easier to implement both sending and receiving. There is far less ambiguity.

In fact, the newline problem I mentioned? It was not easier to diagnose, and was only caught by using tools checking it as a binary structure.

Postel was just flat wrong, and history shows us this is so. JSON is popular because it was automatically available in JavaScript, and people dislike the bit if extra verbosity XML introduces. JSON is also a much tighter format than the text parsing the IETF usually implements.

Postel's law also goes against the idea of failing fast. Instead, you end up thinking you're compliant, because implementations just happen to interpret your mistake in the right way. Then one day something changes and bam, it all comes crashing down. Ask any experienced web developer the weird edge cases they have to deal with, again from Postel's law.

And anyways, you know what everyone uses when debugging the wire? WireShark or something similar. Problem solved. Same for things like JSON. Over the past months I've been dealing with that a lot. Every time I have an issue, I pull out a tool to figure it out.

Do you know the real reason for the text looseness? It's a holdover from the '60s. The Header: Value format was just a slight codification of what people were already doing. And why? Because they wanted a format that was primary written and readby humans, with a bit if structure thrown in. Loose syntax is great in such cases. Modern protocols are not written and rarely read by humans. So it's just a waste of electricity and developer time.