HTTP 2.0

Aloisius · on July 9, 2013

Oh wonderful, it appears to use SPDY-like header compression. Why don't we just create a new compression algorithm and predefine common HTML tags and words to improve compression of the response body too! But we shouldn't base it on any known compression scheme and only use it for the case where the content length is less than 220,498 bytes but more than 8,494 bytes to optimize the behavior of today's more common MTU settings for PPPoE in Scandinavian countries minus of course the most common size of compressed headers today. It will be particularly optimized for the kind of responses Google sends back to minimize their own server load and common adwords will of course be included in the predefined list of compressed tokens.

/sarcasm

What happened to simple protocols? Seriously, per-hop flow control (which works best with which of the dozen version of TCPv4 or TCPv6 flow control?)? TCPv4-like framing with weird limits (16383 bytes)? Keepalives/Ping? Truly ridiculous specialized compression for headers which ignores the role of HTTP proxies? QoS?

Why not just implement the whole protocol over a raw IP connection and stop pretending like we're operating in layer 4+? I get that multiplexing is difficult without flow control, but good lord does this thing look overdesigned for what few benefits it offers over HTTP/1.1.

angersock · on July 9, 2013

Well, we had a nice thing for a while, and now it's time to fuck that all up. The wheel turns.

Look on the bright side--if everyone decides the public Internet is too insecure, maybe we can convince them to keep using the old standards on darknets?

tptacek · on July 9, 2013

Recapitulating a comment from downthread, but, look at DNS for an example of how the IETF botches compression in its "simple" protocols. Not that compression isn't fraught (look at TLS), but I see its use as a sign of maturity.

osth · on July 10, 2013

"What happened to simple protocols?"

Answer: The Internet is still running on them, 30 years later.

Whenever I read something like "simplicity is hard", it makes me cringe. I hear that a lot, and I see evidence of gratuitous complexity everywhere I look these days. I'd hazard a guess the engineers behind SDPY would find simplicity (and reliability) boring.

Debugging binary protocols is either great job security for over eager engineers like the SPDY team or a great waste of our collective time. I'll let you all decide which.

nly · on July 9, 2013

I can't help thinking some of the pain could be resolved if we had a reliable datagram protocol between UDP and TCP. Delimiting a TCP stream to create a messaging protocol is already suboptimal and error prone, and it's the root cause of the head-of-line blocking problem experienced by HTTP ('fixed' in SPDY).

ldng · on July 9, 2013

You mean like SCTP ? It's struggling to get traction.

nly · on July 9, 2013

Yes, because MS haven't implemented it.

lttlrck · on July 10, 2013

It wouldn't get through any home firewalls either. Adoption would take decades...

jahaja · on July 9, 2013

Something like QUIC? I reckon a few of the authors of this draft is also involved in QUIC.

http://blog.chromium.org/2013/06/experimenting-with-quic.htm...

Camillo · on July 9, 2013

There can be no new layer 4 protocols due to NAT. You have to use either TCP or UDP.

nly · on July 10, 2013

So we can move from IPv4 to IPv6 but not introduce a new transport because of NAT? Hmm...

wmf · on July 10, 2013

The IPv6 transition is not going that much better than SCTP. So many people have v4-only home routers that they have no plans to ever upgrade.

mavhc · on July 10, 2013

Most people have a free router from their ISP, the ISP will just send out new ones

nly · on July 10, 2013

My ISPs modem has a built in NAT that can't be disabled, but it still passes all IPv4 packets through in DMZ mode, so there's hope.

neuroscr · on July 10, 2013

So instead of doing things properly, we'll hack around it... Isn't that what we're all complaining about?

Let's do it right and then fix NAT

angersock · on July 10, 2013

Erm, reliable UDP?

You can get a startling amount of the way there with sequence numbers and a few other things--it's a fun exercise.

nly · on July 10, 2013

SCTP is more modern and still under active development. RUDP only has a draft RFC and hasn't been updated since 1999.

angersock · on July 10, 2013

Oh, no no no, you misunderstand--spend an afternoon to implement a reliable communication layer over UDP yourself. I don't suggest another standard (or a defunct one): roll your own.

samwillis · on July 9, 2013

I understand the motivation to going to a binary protocol for a SPDY inspired HTTP 2.0 but I would have liked to see an ascii based protocol similar to what jgrahamc proposed last year [0]. I thought it was a much cleaner protocol to read and understand, and much more in tune with what the web is supposed to be. Why not keep the clever binary stuff separate in SPDY, endorse it though the IETF and keep HTTP ascii?

[0] http://blog.jgc.org/2012/12/speeding-up-http-with-minimal-pr...

steveklabnik · on July 9, 2013

There were so many people just calling for rubber-stamping SPDY as HTTP 2.0 without any changes that frankly, I feel lucky that we're getting revisions at all. The editor of the draft is a good guy, and I trust him to make good changes.

I think it's pretty ludicrous that so many people are willing to let an advertising company basically re-write the way that every computer on the web talks to each other with very little questioning, but this is the place that we're at nowadays. For better or worse, the big vendors guide the standards process. I'd like to see more involvement from the little guys, but that has its own set of challenges.

tptacek · on July 9, 2013

What does Google's involvement in advertising have to do with the design of the SPDY protocol? Can you make a substantive criticism of SPDY based on Google's advertising incentives, or is this just innuendo?

steveklabnik · on July 9, 2013

Soley on their advertising incentives: no. It's a rhetorical flourish.

That said, I _do_ think it's extra important to pay attention to what Google does, for two reasons:

1. They're the largest entity on the Internet. This means their incentives are different than smaller players. 2. They _are_ an advertising company. Advertising companies make money by showing ads. They make more money by showing targeted ads. You target ads by collecting data on people.

I think people often forget Google's purpose in the world, and are simply dazzled by 'whoah cool stuff.' I appreciate some of Google's more interesting and ambitious initiatives, but get very scared when people start accepting any entity's actions without question. Specifically when that entity has large financial incentive to collect data about people.

There are, of course, many technical criticisms of SPDY, but none that rely specifically on the advertising angle.

cromwellian · on July 9, 2013

Personally, I find this line of argumentation nothing more than pure ad hominem. This is an open specification, open to review by everyone. If there were technological changes made to somehow support better data collection, people would be able to see that.

The IETF and the W3C has always had large companies involved in specs, often with their own agendas, I see no reason to attack Google in this way.

steveklabnik · on July 9, 2013

An ad hominem would mean that I am saying they're wrong. My argument is not

    Google is an advertising company, therefore SPDY is bad.

My argument is

    Google is an advertising company, and the largest
    single entity on the Internet, and therefore, their
    actions deserve a healthy dose of skepticism. I'm 
    not sure that we've been giving them enough skepticism.

SPDY does have good points, and bad points. I just saw a lot of chatter from people who want to ignore the bad points simply because of where the spec came from.

tyre · on July 9, 2013

Google also handles more traffic then almost any company on the planet, so their involvement in a discussion about transporting bits is, well, not really that shocking.

The issue is that you're throwing around them being an advertising company as a negative without any discernible proof that it has negatively affected the outcome.

What, specifically (and please spare us the 'rhetorical flourishes') has been proposed that is unfairly biased towards advertising? Which parts should we be skeptical about?

diydsp · on July 9, 2013

It's not advertising that I'm skeptical of w/r/t Google. It's their amount of capital.

E.g., relating to the ASCII/binary discussion above:

A binary web would require advanced retooling and therefore investment. Smaller business entities are not in such a strong position to deal with such a large shift in their workflow. Therefore, switching to a binary protocol would disadvantage entities smaller than google.

steveklabnik · on July 9, 2013

Sure, but bigger and smaller scale operations have different needs. The internet isn't supposed to be about what's best for the big guys, it's supposed to be about what's best for humanity.

I've mentioned specifics elsewhere in this thread, for example, https://news.ycombinator.com/item?id=6013468 and https://news.ycombinator.com/item?id=6012906

stavros · on July 9, 2013

The "specifics" in the other threads are more ad hominem. You're saying "we should be wary of what Google does" without actually mentioning what's there in the spec to be wary about. You're saying "we shouldn't trust Google to pass specs unchecked", people are saying "but we aren't: we've read the spec, and it's good", and you're saying "yeah, but we shouldn't trust Google".

steveklabnik · on July 9, 2013

> without actually mentioning what's there in the spec to be wary about.

https://news.ycombinator.com/item?id=6013088

And, as I mention elsewhere, it's not just the spec, it's the general trend of Google dominating the web. https://news.ycombinator.com/item?id=6013468

stavros · on July 9, 2013

Sure, the general trend is a good point, just not entirely relevant to this specific thread. Your comment above does make good points about the actual spec, I agree.

brown9-2 · on July 9, 2013

But why is an advertising company deserve more skepticism than a company that sells operating systems or database servers?

steveklabnik · on July 9, 2013

As I mentioned elsewhere in the thread, an advertising company has incentives to collect and process as much data about individuals as possible. This is Google's core competency.

Oh, and in Google's case specifically, they now control the largest web site, one of the largest web browsers, (with this) the protocol it talks to, and they're attempting to supersede JavaScript too... and one of the largest client-side frameworks. The list goes on and on. The largest mobile phone OS. Email provider. (?) Working on social network...

smutticus · on July 9, 2013

Unless you can point to a specific point in this IETF draft which furthers Google's interests while diminishing others' I'm going to ignore you.

steveklabnik · on July 9, 2013

There is more to the world than a technical draft, you can't just abstract away the rest of everything else.

For technical critiques, I commented earlier: https://news.ycombinator.com/item?id=6013088

chc · on July 9, 2013

> There is more to the world than a technical draft, you can't just abstract away the rest of everything else.

Then maybe you should actually bring these things up. You seem to be playing coy and throwing around innuendo. If you have real, substantive concerns about something, I think the conversation would be greatly improved by actually bringing those up rather than just casting aspersions about Google.

steveklabnik · on July 9, 2013

I have brought them up, this isn't my only comment in this thread.

gnosis · on July 9, 2013

There are a large number of Google employees and stock holders on HN, and it would not be at all surprising if they leapt to Google's defense for non-technical reasons.

It would be enlightening to see the defenders of Google disclosing whether they have any financial or other interest in Google.

To complement that disclosure, any attackers of Google should likewise disclose whether they have any financial or other interests in Google's competitors.

cromwellian · on July 9, 2013

It would help if people just criticize on the spec based on the technical merits.

steveklabnik · on July 9, 2013

I agree that technical critiques are good, but technology is social. You can't abstract away the rest of the real world. We need both kinds.

cromwellian · on July 9, 2013

Sure, but I don't find the social critique very useful either.

There are things that Google does, as a corporation, that you can find fault with, technological decisions that may or may not have been influenced by business model. For those, fire away. But the attack on SPDY/HTTP/2.0 because "Google is an advertising company" (which if you actually worked at Google, and knew how people made decisions here, you'd know is ridiculous from an intent or motivation point of view) is just pure mudslinging.

Examples of stuff that I, as a Google employee, would criticize Google for: Real Names, building "siloed" services and moving away federated/decentralized approaches (see my essay here: http://timepedia.blogspot.com/2008/05/decentralizing-web.htm...), most of what Yegge said about APIs, Google Hangouts going "silo" and away from XMPP model, etc.

People who work on ads and take their marching orders from ads are a small portion of employees at Google. The guys working on Chromium/Blink/SPDY do not report to ads, do not take orders from ads, and in general, work on technology without reference to monetization strategy. Their day to day job is to improve technology, with the hope that if you raise the tide, all boats will be lifted, and they'll be some ROI from that.

But that the idea that engineers are taking marching orders from shareholders to maximize profits based on ads by tweaking web standards is hilariously wrong for people working on Chrome.

steveklabnik · on July 9, 2013

I'm not talking about "Larry And Sergey have decreed that Evil Shall Happen!" I'm talking about broad economic incentives. Since I don't work for Google, I have to treat them as a black box; I see what goes in, I see what comes out. I know nothing of the internals, I only have one friend who actually works there. If I implied there was some kind of conspiracy, that is my fault. You're right that that would be ridiculous.

I would also criticize Google for your reasons, and they may be even more important. But this isn't a thread about those things.

tptacek · on July 9, 2013

A reasonable argument would say that we don't need the social and political stuff in standards discussions, which should be based instead on engineering.

steveklabnik · on July 9, 2013

Absolutely. This is why I wouldn't make these comments on the IETF mailing list. I do think that HN is an appropriate venue, this is very much a social place.

"Should we be doing this?" and "How should we do this?" are two very different questions.

tptacek · on July 9, 2013

That's a fair point for why your argument is germane to HN, but for what it's worth I still don't agree with the argument.

tptacek · on July 9, 2013

https://news.ycombinator.com/item?id=5992358

vanderZwan · on July 9, 2013

While I agree with keeping a healthy skepticism of Google, I have a bit of difficulty with this wording:

> I think people often forget Google's purpose in the world

The fact that Google makes money through showing ads doesn't mean it's their purpose.

tptacek · on July 9, 2013

I think it's fair to argue that a company's purpose is best illustrated by its revenue streams. Reasonable people can disagree depending on the circumstances.

tyre · on July 9, 2013

Not necessarily the case.

Github's revenue stream is through private repositories (both hosted on github.com and self-hosted enterprise), but I don't think you could reasonably assert that Github's purpose is to make a profit off of keeping code private. Their actions, in fact, suggest precisely the opposite.

alexsb92 · on July 9, 2013

In some cases, a company could transcend its initial purpose, but still keep it around as a/the revenue stream as a means to the new end. Not many/any new and further out there Google initiatives have made it to wide scale public adoption, so it's yet unclear whether Google would be such a company, but it could very well turn out to be one.

sageikosa · on July 9, 2013

If it's a publicly traded company, it has a fiduciary responsibility to make money for its investors; so I'd have to agree with you. It's purpose is to make money. It might spend money to buy goodwill to earn loyalty, but at the end of the business day, its a business.

dspeyer · on July 9, 2013

Google's corporate charter was specifically written to avoid that. And shareholders have no meaningful voting rights, so they can't override it there either.

pbhjpbhj · on July 9, 2013

>If it's a publicly traded company, it has a fiduciary responsibility to make money for its investors //

Where is this codified?

sageikosa · on July 10, 2013

It doesn't have to be "codified" to be fiduciary. The trust relationship between any investor and the investment enterprise is that the enterprise will be able to generate a return on the investment. If it doesn't assume this, it generally will be deemed a non-profit.

pbhjpbhj · on July 11, 2013

If it's not codified then it's more likely an expectation than a responsibility. Of course investors expect a return, that's what the term "investor" entails.

Non-pecuniary returns can satisfy the responsibilities of an enterprise.

It appeared that a legal obligation was being suggested. What sort of obligation was being suggested and how is that obligation derived and enforced?

regularfry · on July 9, 2013

To play devil's advocate, what is Apple's "purpose"? How often has it changed?

epo · on July 9, 2013

Their shareholders would beg to differ.

tptacek · on July 9, 2013

Could you please share some of the technical criticisms of SPDY that you find most compelling?

fusiongyro · on July 9, 2013

I can't speak for Steve, but I thought Poul-Henning Kamp's critique was pretty compelling.

http://lists.w3.org/Archives/Public/ietf-http-wg/2012JulSep/...

In brief, he makes three points. The first is that SPDY/HTTP 2.0 doesn't do anything about the widely lamented lack of session handling. The second is that it doesn't contain any simplifications of HTTP, despite there being several examples of things that could be simplified (header parsing, for instance, is hairier than it could be). The third is that it is going to pose problems for proxies.

I don't know how many of these points continue to apply with this HTTP 2.0 draft, nor do I have any skin in this game, but I respect PHK quite a bit so his outrage creates in me a sense of mild reservation. :)

tptacek · on July 9, 2013

I too have unreserved respect for PHK as an implementor. I'm not sure I find his critique compelling. It seems to me that it distills to a couple simple points:

* SPDY depends on Deflate compression, and will require middleboxes to implement deflate to route requests. I think the "IETF school of design" has an irrational fear of good compression and I think it's harmed other protocols, most notably DNS. I may be poisoned into this viewpoint by Bernstein.

* There are protocol constants that PHK doesn't know the background of, which strikes me as the kind of documentation bug that something like an HTTP 2.0 would address.

* SPDY might have required another WKP, which isn't really a SPDY problem.

* There's DoS potential in SPDY --- but of course, there's DoS potential in HTTP too; look at chunked encoding, for instance. For that matter, modern HTTP 1.1 also accommodates compression; when it comes to attack surface, in for a penny, in for a pound.

* A similar argument addresses PHK's concerns about the (theoretic) security of the push model, which is also something that modern HTTP accommodates.

Later:

Oh oh also: PHK sees HTTP 2.0 as an opportunity to correct the session management problem, which has led to the "bass ackwards" design of heavyweight signed cookies in web applications. I sympathize with him on this point, but it's not HTTP's fault that this happens. HTTP 1.1 cookies also used to be simple opaque session IDs; heavyweight signed cookies are a consequence of server app architecture, not the underlying protocol.

Even if HTTP 2.0 had built-in robust session management, Rails apps would still be shoving several kbytes of encrypted state out to web browsers.

X-Istence · on July 9, 2013

I hadn't read about DNS compression from DJB, but having looked at implementing it, I wish I had never started reading the specs for it.

Such a complete and utter mess of a scheme!

JoshTriplett · on July 9, 2013

The first two criticisms of SPDY sound like "doesn't solve every known problem with HTTP at once", which was never a design goal; that doesn't make SPDY bad, it just means that further room for improvement still exists.

The third criticism, that SPDY makes life more difficult for routers, makes me wonder: would this get easier if SPDY just said "forget the Host header, SPDY requires SNI"? Seems like that would help.

fusiongyro · on July 9, 2013

My main objection is that the name you call something does matter. SPDY is a very different protocol from HTTP, which addresses a very particular set of concerns. It diverges quite a bit from the "intent" of HTTP. This is all fine and good until you change the name from SPDY to HTTP 2.0. One expects 2.0 of something to continue the same philosophy and motivation that produced 1.0. When that doesn't happen (R6RS is another good example) you can expect some pushback. In this particular case, the "label swap" nature of the process is generating animosity from those who feel that the process has been co-opted by people trying to pull a fast one. I don't think SPDY is intrinsically wrong, I just don't think it looks like a natural successor to HTTP. I wouldn't expect HTTP 2.0 to address every known problem with HTTP at once, but I don't think it's unreasonable to expect at least a few aesthetic improvements.

chc · on July 9, 2013

I don't see how this follows from your earlier objections. "It doesn't add session handling and it doesn't simplify header parsing, therefore it diverges from the intent of HTTP" seems like a non sequitur.

fusiongyro · on July 9, 2013

Don't confuse my objections with PHK's objections. There may be good technical answers to his objections; Thomas replied to them above quite cogently, but in any event, PHK's opinion carries a lot more weight than mine. I'm just a spectator.

My objection (observation, really) is that one expects protocol 2.0 to do more than address performance optimization. Simplifying the protocol is a good thing to do with a major revision; they didn't do that. Making the protocol more friendly for upper layer users is another good thing to do with a major revision; they didn't do that either. Instead they took an obviously different protocol designed to address a handful of extremely technical performance matters and rubber-stamped it as HTTP 2.0. Whether you like SPDY or not, it should be clear that this kind of "process" is going to leave people feeling disenfranchised. The spirit of HTTP, inasmuch as such a thing exists, is one of simplicity. SPDY just doesn't "smell" like the successor.

I think the comparison to R6RS is very appropriate to my point. R6RS was designed to address well-known shortcomings of Scheme. The process it took to get approved circumvented a lot of the community. A large segment of the community responded to this by essentially whining about it and ignoring it. We already see the whining about HTTP 2.0. I predict it will be followed by ignoring it, and some years in the future, an HTTP 2.1 or 3.0 that more closely resembles HTTP 1.1.

steveklabnik · on July 9, 2013

My sibling has already pointed out one of the better critiques I've seen. There is also http://www.guypo.com/technical/not-as-spdy-as-you-thought/ , which I believe has been discussed on HN before, but I'm on a pomodoro break, so I'm trying to keep this short.

One critique that I don't remember if is contained in either of these two is header compression. Header compression seems to make sense, as compression is good. The problem is that intermediaries make routing decisions based on the headers, and so it's quite possible that the CPU time needed to decompress, possibly modify, and recompress the headers outweighs any gains that the compression brought in the first place.

I've also seen some vague commentary about 'mixing application concerns into the transport layer' which I find compelling, but I don't have enough experience with the low-level networking to properly judge on my own.

Break over! Gotta run.

lazyloop · on July 9, 2013

Worst of all is that compression is stateful, you need to capture the whole HTTP/2.0 session to be able to reconstruct any information with mandatory HTTP/2.0 debug tools.

zzzcpan · on July 9, 2013

> their incentives are different than smaller players

Yes. They are not representing smaller players, i.e. majority. And I think for smaller players speed is not as important as convenience. So this can even hurt smaller players in a long run.

samelawrence · on July 9, 2013

Google's self-declared, initial purpose was to organize, curate, and present all the world's information. We didn't know that meant people too.

chc · on July 9, 2013

You don't think Google receives enough skepticism? Every time they brew a pot of coffee, somebody out there declares that Google has violated their "Don't be evil" motto and is out to destroy us all with their dark caffeinated schemes. I can think of very few companies that are treated with more skepticism than Google.

opsmgmt · on July 9, 2013

* flow control issues, iirc QUIC is intended to help address this.

* TLS not mandatory for HTTP/2.0

tptacek · on July 9, 2013

Google is the industry's most active and effective corporate advocate for TLS. They're one of the key drivers for certificate pinning and one of the earliest mainstream deployers of forward secrecy. So I think that argument is a little bogus.

I don't understand the first point, though. Could you clarify?

pritambaral · on July 9, 2013

QUIC is a very new, experimental protocol that runs on UDP. Their (relevant) basis is that TCP's algorithms are completely controlled by the OSes and the routers and all. Using UDP, QUIC can quickly deploy new algorithms without requiring a major part of the world's infrastructure changed.

tptacek · on July 9, 2013

Sure, but SPDY is a TCP protocol, and so inherits TCP-friendliness from that; I'm just not seeing what that has to do with HTTP.

lazyloop · on July 9, 2013

Google is the industry's most active and effective corporate advocate for TLS simply because it makes tracking users and selling targeted advertising a whole lot easier. Their involvement in the whole PRISM affair has undoubtedly demonstrated that privacy is none of their concern.

tptacek · on July 9, 2013

I wish I could somehow CAPTCHA comments like these so I could tell if they were people or Markov generators.

jfb · on July 10, 2013

Years ago, in days of old, when magic filled the air, I wrote a Slashdot troll post generator. It eventually produced some pretty hilarious posts, but I never closed the loop by allowing it to post. It would make a fun project for learning a new language; perhaps I'll install Dart and give it a shot.

drivebuy · on July 10, 2013

With SPDY as implemented all requests for google analytics reuse the same TCP connection. This connection acts as an implicit tracking cookie uniquely identifying your browsing session.

aroch · on July 9, 2013

>I think it's pretty ludicrous that so many people are willing to let an advertising company basically re-write the way that every computer on the web talks to each other with very little questioning

I don't think there's very little questioning...in anycase, Google's goal is to deliver ads to you in the most efficient and fastest way possible. The more you browse the web the more Google makes; it's in their interest to develop SPDY/HTTP2.0. So what is wrong with them doing it? IEFT spec drafts are public and they're audited (as SPDY has been).

Also, most of the specs that we love and rely on today came from "big vendors", its nice and all to say you want the little guy to be a part (and they should be) but it takes quite a bit of man power to develop, draft and finally get ratified a spec.

_delirium · on July 9, 2013

Also, most of the specs that we love and rely on today came from "big vendors"

The core internet protocols we rely on, though, mostly didn't. If you look at the authors of RFCs specifying the widely used standards, nearly all of them were at research institutions: Steve Crocker was at UCLA then ARPA; Vint Cerf was at UCLA, then Stanford, then ARPA; Bob Kahn was at ARPA; Jon Postel was at UCLA then USC; Paul Mockapetris was at UC Irvine; Abhay Bhushan was at MIT; Tim Berners-Lee was at CERN then MIT.

Not sure if that's good or bad, but it seems to have been uncommon until recently for internet protocols to come from vendors.

edit: I did think of one important one, IPv6. Steve Deering was at Stanford, then Xerox PARC, then Cisco, and IPv6 came out during his Xerox/Cisco period. Bob Hinden was at Ipsilon Networks, then Nokia.

brokenparser · on July 10, 2013

it seems to have been uncommon until recently for internet protocols to come from vendors

Sorry, that's been going on for quite a while now. Cases in point: http://tools.ietf.org/html/rfc3768 http://tools.ietf.org/html/rfc5077 http://tools.ietf.org/html/rfc2637 http://tools.ietf.org/html/rfc2281 (these are just examples, there are many more citing Cisco, Microsoft, Nokia, Google, etc.)

cpr · on July 9, 2013

Don't forget Dave Clark, who at MIT, and is the chief architect of TCP itself.

steveklabnik · on July 9, 2013

> IEFT [sic] spec drafts are public and they're audited (as SPDY has been).

Absolutely. But there's more than one kind of control. I don't think enough programmers understand the effects of social control. If the standards are all public and audited, but only employees of Apple, Google, and Microsoft have the time and energy to keep up with discussions, well...

And, of course, I'm not imply that _only_ that is true, I just fear that big organizations are dominating the discussion. I have more free time than the vast majority of programmers, are subscribed to the HTTP 2.0 mailing list, and find it hard to keep up.

stormbrew · on July 10, 2013

> IEFT spec drafts are public and they're audited (as SPDY has been). > Also, most of the specs that we love and rely on today came from "big vendors", its nice and all to say you want the little guy to be a part (and they should be) but it takes quite a bit of man power to develop, draft and finally get ratified a spec.

It's especially hard when the call for proposals period of the draft is about 4 months and there happens to be a ready made proposal from a big player at the ready to be agreed on almost immediately. It's nice to say the little guy should be a part, but in this case the little guy mostly heard about it long after it happened.

9h1d9j809s · on July 9, 2013

> I think it's pretty ludicrous that so many people are willing to let an advertising company basically re-write the way that every computer on the web

First, to call Google an advertising company makes no sense. It's a tech company. You don't call a newspaper an advertising company either.

Second, there were people involved in designing this, not just an anonymous corporation. You can actually see their names in the proposals. It's a good design, that's why it has been adopted.

steveklabnik · on July 9, 2013

> First

How does Google make money again? Newspaper companies are advertising companies, especially given the quality of the news lately. ;)

> Second

Absolutely, and I don't mean to denigrate their technical efforts. I'm glad people want to move the web forward. I'm just recommending caution.

> It's a good design, that's why it has been adopted.

Many poor designs have garnered wide adoption in the past, this is not inherently true.

lazyloop · on July 9, 2013

Wide adoption of SPDY also seems like quite an overstatement, a few big players have been trying it, but the only number we have so far is 339 SSL certificates used with SPDY-enabled servers in may 2012. http://news.netcraft.com/archives/2012/05/02/may-2012-web-se...

md224 · on July 9, 2013

A Google recruiter actually told me that Google is "basically an advertising company." Then again, he was recruiting for an AdWords-related position, so his viewpoint may have been skewed.

youngtaff · on July 9, 2013

Oh come on don't let paranoia encourage you to throw the baby out with the bath water…

The way Mark Nottingham ran the original CFP for HTTP 2 and the eventual adoption of SPDY as a starting point was very fair - it's all there in the IETF archives for anyone to see. From memory there were only two other proposals (from Microsoft and someone else)

The reason Google were able to get a new protocol up and running is because they have both heavily used web properties and a browser. They're also willing to carry out experiments in public.

As it stands HTTP 2.0 will be good for the little guys too, based on the testing I've done little guys will see an improvement in performance without needing to do all the merging that destroys cache lifetimes.

3rd party-content is the fly in the ointment to the performance improvements so we'll need to be much more careful about the performance of the 3rd party sites we include.

N.B. Apart from using their products I have no affiliation with Google

smutticus · on July 9, 2013

> For better or worse, the big vendors guide the standards process.

It costs money to have people on staff who write IETF drafts and haggle them up to RFCs. Hopefully the standard isn't too degraded by the needs of Google in this instance and everyone benefits.

steveklabnik · on July 9, 2013

Absolutely. I'm not saying that it doesn't. I share your hope, but I'm not super optimistic in the general case.

0xbadcafebee · on July 9, 2013

This idea is bad for a couple reasons.

First, ASCII is inefficient. People don't interpret HTTP, computers do. Web servers and browsers. People only look at HTTP when they want to troubleshoot without any tools. With real tools, you can find out what's broken much quicker. And there's plenty of things you can miss without a real HTTP interpreter. Most hackers prefer to think of themselves as wizards that can spy 0's and 1's and tell you what the weather is. It doesn't make for a better protocol, though.

Second, we can already break HTTP responses up in multiple parts, using a novel idea called "multipart". It sucks and nobody has used it since HTML/JS found new ways of providing content. http://www.w3.org/Protocols/rfc1341/7_2_Multipart.html

Third, it's a hack. If you want to improve the protocol, improve the protocol, don't just hack onto it to make it do what you want. I could make a horse and buggy go 60mph, but would it be a good idea? How about just designing a better buggy that is intended to go 60mph?

Fourth, fixed-length records are the wave of the future! It solves crazy problems like header injection and request integrity checking. Moreover, it makes for simpler, more efficient parsing of requests.

Fifth, redundancies introduced from the beginning of time need to go away, like terminating every record with "\r\n", or passing the same headers on every single damn request when once should be just fine for a stream of requests. Little inefficiencies like this don't go away if you just hack onto the same old protocol forever.

Sixth, the flow control improvements can make different applications more efficient by both not having to hold state of where and when traffic is coming and improving flow across disparate network hops.

Seventh, as to a general question "why use binary when we can compress stuff", what is smaller: a string of bits, or a compressed string of digits? Add to this that every header could have a 32-bit identifier (4 bytes) and you've got more efficient compression than gzip. Of course, most people would argue that you don't save enough time or bandwidth or cpu cycles to make a difference, but it would make for easier-to-write parsers, which would make working with the protocol in general more attractive. But then you have your binary-detractor-wizard-hackers and the whole conversation becomes an infinite loop.

bajsejohannes · on July 9, 2013

> First, ASCII is inefficient. People don't interpret HTTP, computers do.

I can't tell you how many times I've manually read HTTP. To be sure, it's insignificant compared to how many HTTP headers have passed through my computer unseen by me.

ASCII may be inefficient, but computers are really fast, people are not. I don't have any measurements, but in making a browser, HTTP header parsing/writing was never near a performance issue. Bandwidth wise it also tiny compared to that image file you'd inevitably download for every page visit.

And sometimes you don't have tools. Sometimes you don't want tools. Sometimes you want to use tools that work with text to analyse your problem.

(You other arguments might still stand, though :) )

FlyingAvatar · on July 9, 2013

> HTTP header parsing/writing was never near a performance issue.

Since header lengths are not limited, and a single TCP packet's payload is quite limited, long headers can cause very measurable latency difference. Additionally, while I agree the generation / parsing overhead is probably quite small, saving it for every HTTP request is still a boon.

I'm also curious where you are reading raw HTTP from?

For me it's primarily in two situations, reading a packet capture from WireShark, or in the browser's debugger. In both cases, the tool will end up translating the request for me.

alayne · on July 9, 2013

I guess a lot of people here never had to work with the X protocols like X.400 and X.500. When you need specialized tools for every protocol and encoding format, development is a real drag.

Just because I use Charles or Wireshark doesn't mean that I only want to use those specialized tools. I have definitely been in situations where I'm doing something like running nc as a proxy and looking at raw HTTP. I wouldn't choose to throw that away and revert to the bad old days without a big win.

0xbadcafebee · on July 9, 2013

The nc example is the last bastion of this argument. But what about SSL? For a long time you just couldn't test it, or maybe use OpenSSL's server feature piped to nc.

But now nc supports SSL natively, making it super easy. Just as it will support binary HTTP natively, making it super easy. And everyone will finally stop caring about ASCII.

BHSPitMonkey · on July 9, 2013

Should your willingness or unwillingness to use a tool for these (rare) scenarios influence the design of the protocol in any substantial way?

bajsejohannes · on July 9, 2013

> saving it for every HTTP request is still a boon

I'm questioning the measurability of this, though. Smells like premature optimization. You could be right, but I'd like to at least measure it before we go about changing one of the fundamental protocols on the internet.

Last time I read raw HTTP was when writing a script to automate some stuff on a web page. I specifically did not want the browser's headers and behaviors. I had a bug which only happened from my script, and raw HTTP helped me track it down. I could have used wireshark, but I am much faster in vim for a simple task like that.

0xbadcafebee · on July 9, 2013

HTTP has existed for over 20 years. We've had some time to look at it. It has been measured.

As a comparison to your scripting story, you would use Wget or Curl or LWP::UserAgent or a thousand other things to automate HTTP requests. One function call to do what you did manually. To find bugs you would use an HTTP fuzzer like Skipfish to automate the process. If you think somehow your manual process was faster, I say to you, teach a man to fish...

(I automate things in web pages for a living, and I only use tools like Firebug and LWP)

guelo · on July 9, 2013

Plain-text formats have always been slower for things that are not plain text. But even 30 years ago, when computers were even slower, Unix designers decided plain text was still the way to go, because it was easier to debug and easier for humans to work with. No specialized tools required, no poring over hex dumps. HTML won over other document formats. JSON and XML won over other binary formats. Any coder can look at JSON and see what is being transferred, without the aid of anything but a text editor. Plain-text marshalling formats for binary data (e.g., base64) are still useful for pasting data into an email or adding ssh keys to authorized_keys with "cat >>". Tool support is not going to make SPDY any nicer.

0xbadcafebee · on July 9, 2013

Things have changed in 30 years. Unix designers didn't have the time or resources to write elaborate tools, nor the need for complicated software. Back then you would use telnet to browse Gopher or send your mail. Things are different now. I dare you to read a 3KB JSON file without a parser. Base64 was a hack for text-based protocols. Tool support will make it a lot nicer than no support.

nickik · on July 9, 2013

Simple protocol, makes it simpler to write tools, simpler tools are easier to change, upgread and its simpler to add features, faster development, faster improvment. Better live for programmer.

Also sometimes you need something spcific and then you have to option to code it up yourself, or change a open source library quickly and efficently.

sergiosgc · on July 10, 2013

The main "thing" that changed in 30 years is computational power, which is now several orders of magnitude greater. If 30 years ago computers sporting the power of timex watches spared the cycles for text protocol overhead, I fail to see the need to squeeze, in today's hardware, that last drop of performance.

The advantages of text based protocol remain the same. The disadvantage is lessened by faster CPUs.

0xbadcafebee · on July 10, 2013

The only advantage to a text-based protocol is you can read and understand it in raw form. Unfortunately this is not an advantage over binary protocols.

If anything, the more complex the protocol, the more redundant the text becomes, because we have to write tools to parse the text and output it so we can understand it better or identify flaws in it, and work around bugs introduced by the human element of the protocol. The ability to view and interpret the protocol in a text editor is equivalent to the ability to view and interpret the protocol as output from a debugging tool or log file - except the tool can give you much more detail than the text file in a variety of ways. Text files are inferior, but they can be quicker/simpler, depending on what you're doing.

You still need a library or tool to write the protocol out, as it's complicated and needs to be structured for the machine, not a person.

Second, saying "it's ok that it's slow, we'll just buy a faster CPU" is not a good argument for anything ever. It's part of the reason it's taken so long to adopt encrypted services everywhere. Someone (Google) had to finally prove it wasn't slow so people would adopt it.

Third, the state of modern computers is that there is no difference in speed between interpreting most text protocols and binary protocols. But that has nothing to do with efficiency, or what the machine is naturally suited to doing. You have to translate from English into machine code for a computer to know what the hell another machine is talking about. Machines don't care about line-by-line, or capitalization, or indentation, spaces, or any vestige of our natural language. Strip all those things away and machines purr along happily with less bullshit to deal with, which means simpler, more efficient code. Note that I didn't say faster.

Fourth, your performance and history observation is flawed. We need a lot more performance today than we did before, as we're scaling existing technology to many, many orders of magnitude higher than anything that existed when it was invented. Yes, we have faster CPUs. We also have more users and more data, and we don't have time to sit around reading packet dumps in text editors.

sergiosgc · on July 10, 2013

> The ability to view and interpret the protocol in a text editor is equivalent to the ability to view and interpret the protocol as output from a debugging tool or log file - except the tool can give you much more detail than the text file in a variety of ways. Text files are inferior, but they can be quicker/simpler, depending on what you're doing.

This is entirely false, as anyone who ever had to debug a malfunctioning http proxy or a misbehaving IMAP can tell you. Nothing beats netcat for a quick bug isolation test. As for the need for formal parsing, again it is true for production code, entirely false for sysops transient tasks.

Compare debugging a corba server with debugging http for a whiff of the he difference.

Tools aren't omnipresent. My miryad busybox embedded devices won't ever likely have a protocol analyzer. If I'm in need of one there, I'm done with.

ulope · on July 10, 2013

No I don't particularly want to read a 3kB JSON file without tools - but the point is: in a pinch I _CAN_.

With a binary protocol you're entirely dependent on tools (except you want to trawl through it with a hex editor)

0xbadcafebee · on July 10, 2013

You're already dependent on tools - your eyes and language processing parts of your brain - to use text formats. With a binary protocol you'd be equally dependent on tools. They're just not embedded in your skull.

Seeing as we use binary protocols every day of our lives, and the tools to work with them have existed for years, and nobody has any problem with using them, let's let this argument rest.

Dylan16807 · on July 11, 2013

"Here is a ridiculous stretch of semantics because I can't admit anyone else has a point. Also let's stop arguing at the end of this sentence to avoid rebuttal."

d-(^.^)z

zobzu · on July 10, 2013

And reading such comments all I think is:

"The history is doomed to repeat itself".

romaniv · on July 9, 2013

The same arguments can be applied to HTML, CSS, JSON, RSS and so on. I fail to see the crucial difference between those and HTTP. Or would you say the web as a whole should be binary?

Fifth, redundancies introduced from the beginning of time need to go away I wholeheartedly agree with this, but it doesn't automatically warrant binary encoding.

Human readability is a huge bonus in any protocol or format. Not because normally people read those protocols, but because people read ASCII and therefore they have good tools to work with ASCII.

0xbadcafebee · on July 9, 2013

...? Seriously? You don't see the difference?

HTTP is a layer 7 communication protocol. HTML/CSS are markup languages for designing an interface. JSON is a data interchange format. RSS is a content syndication format.

They are all wildly, vastly different. The only thing they have in common is they're all ASCII. If anything, you're making my argument for me: a communications protocol is not a format for displaying documents, it is a language for communicating machine instructions to network applications. Historically they have always been binary because it works better that way.

Your argument that "people can read ASCII, so ASCII is good" leaves out a couple points. Like, human beings do not read an HTTP statement, go into a file folder, bring out a document and present it to their computer. It's the other way around.

Really this just reflects a strange phobia people seem to have. Like your brain is tricking you into thinking you'll lose something by not looking directly "at the wire".

When you look at HTTP headers, 90% of the time you're actually looking at a pre-parsed, normalized set of fields. If you look at a raw packet dump, the whole message may not show up in one packet; you may have to reassemble it, which means parsing. If you have multiple requests in one connection, you have to find the end of the last request, which means seeking through the stream; seeing requests broken down individually means a tool already parsed them. Firebug and wireshark and other tools all take care of the automated, machine-operated work for you.

And what's left? What do you have to do with HTTP, really? Apache rules? They'd stay human-readable. Application testing? We use proxies that handle it, and APIs for client/server programming. Firewalling? Handled by tools and appliances.

Stop giving me the blanket "ASCII is great for everything" excuse and tell me one thing, one single thing, that only humans are able to do with HTTP using their eyes without a tool. But you don't have to, because that's impossible: HTTP is not for humans.

romaniv · on July 9, 2013

Apache rules? They'd stay human-readable.

I look forward to servers having different text representation of the same binary headers in their config files.

tell me one thing, one single thing, that only humans are able to do with HTTP using their eyes

You're missing the point.

No one writes HTML manually anymore either. People generate it using tools (string processing tools in a language or templates) and read it using browsers. Heck, even Notepad++ is a tool, but a generic one.

If you want, you can generate all your HTML using DOM. But almost no one does that, because DOM tools are clumsy, while text-based tools are easy to use.

angersock · on July 9, 2013

"No one writes HTML manually anymore either."

I was up until four last night doing just this. It's commonly done for templating purposes all the time, or quick hacks and placeholders.

Have you lost your mind?

0xbadcafebee · on July 9, 2013

You're actually still arguing for my point instead of against it.

If no one writes HTML manually anymore, then we have no need for it to look like English when the computer interprets it! We can compile the HTML down to bytecode and have it be interpreted much quicker by the computer, which won't have to do the job of lexing, compiling, assembling, etc. Here, two steps would be eliminated immediately, resulting in increased speed and more efficient storage and transmission: http://www.html5rocks.com/en/tutorials/internals/howbrowsers...

For that matter, if it's generated by tools, and we use programs designed to interpret and decipher and color-code it, all of that can happen without it being in English!

On top of that, you missed when I said HTTP is a communications protocol. Ever seen the movie The Matrix? Know how the sentinels would sometimes look at each other and make scuttling noises, then shovel off somewhere? They weren't speaking English ASCII. They were speaking a binary communications protocol. Know how I know? BECAUSE MACHINES AREN'T HUMANS! It would be absolutely moronic for them to speak English to each other. It would be like dogs saying the English word "bark" instead of just barking. Completely unnecessary and crazy. But that's what an ASCII communications protocol for machines is.

On top of that, there is no benefit, not one at all, to humans being able to read it when tools already exist to interpret and display it even more human-readable than its natural state. We squish and compress and strip HTML and JS already just to make it more efficient, and then undo the whole process just to read it. It's insane.

ricardobeat · on July 9, 2013

So you really think we'd be here today if instead of HTML, CSS , Javascript, JSON, XML we had a web based on bytecode formats?

The web is made by people, not computers. Open an ubiquitous text-editor and you can start working on something right away. If you have to download a dozen different compilers and IDEs to do that, it's definitely not the same.

0xbadcafebee · on July 9, 2013

"The web" is actually just a collection of hyperlinks, applications that parse markup and document storage and retrieval services. You don't see code. You see pictures of cats. And you never, ever need a text editor to use it.

Face it. Your love affair with ASCII is just that: an emotion.

(As to your original question: humans haven't needed to program in binary or assembly for decades. That's what so great about computers: they do the hard work for us, so we don't need to type everything manually into a text editor. Is that such a hard pill to swallow?)

ricardobeat · on July 9, 2013

You're completely ignoring the fact that the web began as (and still is, in part) a collaborative tool and publishing platform. Text-based formats played an immense part in that, geocities, the rise of personal publishing, blogs, these would not have happened without them.

Yes, binary is more efficient, but then tell me why is JSON the most popular data interchange format on the web today?

0xbadcafebee · on July 10, 2013

Because XML, the preeminent human-editable data interchange format, sucked balls. It's only superseded YAML because it can be stripped of whitespace and it has the word "Javascript" in it.

walterstucco · on July 10, 2013

binary formats sucked so much, that they had to invent XML and it was a much better way to start the interaction era, were services talk to each other without having to read a 30 page spec just to understand how to write the right payload for the interchange format used. Let alone the byte order...

bitwize · on July 9, 2013

For that matter, if it's generated by tools, and we use programs designed to interpret and decipher and color-code it, all of that can happen without it being in English!

Yes, let's base HTML 6 on Word .doc.

Also, the machines in The Matrix were hostile to humans. We'd like machines in the real world to be... not so.

walterstucco · on July 10, 2013

Are you arguing that we should have embraced Java Applets and ActiveX controls, because they are binary formats, hence more efficient? HTTP is NOT a communication protocol, it is an APPLICATION protocol. HTTP is an application on top of a transport layer, HTTP, just like SMTP, IRC, FTP, IMAP etc etc is just a protocol that describes applications. It is not TCP or UDP and SHOULD NOT BE!

amenod · on July 9, 2013

> and tell me one thing, one single thing, that only humans are able to do with HTTP using their eyes without a tool.

debugging.

sigzero · on July 9, 2013

How are you going to see the header? Useless to debug if you can't see it.

mbreese · on July 9, 2013

Haven't you ever done an HTTP request through nc or even just telnet to see what responses came back? This is the best way to troubleshoot strange reverse proxies or rewrite rules.

0xbadcafebee · on July 9, 2013

Boy, wouldn't it be crazy if applications included debugging modes that told you exactly what they were doing?

mindcrime · on July 9, 2013

Do all applications include something like that? If not, what makes you think that they will in your hypothetical "binary is king" future?

0xbadcafebee · on July 10, 2013

Have you ever heard of the tcp/ip protocol suite? I hear there's some things you can use to debug it. Might even support HTTP in the future.

mindcrime · on July 10, 2013

Yes, but that has nothing whatsoever to do with the question I asked.

lambda · on July 10, 2013

Here's the problem; you keep on assuming that good tools will magically appear, that help with debugging. But good tools take a lot of time and work to perfect. In reality, you usually wind up with just barely good enough tools.

With a text based protocol, you can inspect it visually with no special tools, and munge it with general purpose tools that you already know how to use (shell script, sed, awk, perl, python, ruby, what have you) with no special support libraries or anything of the sort. Support libraries can help you with the more complex aspects of the protocols, but for basic debugging purposes, you can do it all with general purpose tools.

With a binary protocol, you need those libraries to even have a chance of being able to work with it. Now you can't use a general purpose shell pipeline to munge it; no more nc | grep or what have you. You have to have a wireshark dissector; and good luck figuring out how to grep through the results of what a wireshark dissector generates.

The main point is that the overhead of the ASCII encoding isn't the main problem with HTTP. Reading ASCII encoded CRLF delimited headers is a solved problem (and heck, you could probably switch that to just LF delimiters, since I'm sure that most processors already handle that case just fine).

The problems are things like having to repeat headers over and over again for each request in a session, enormous cookies that need to get sent with every request, and the like. But you can solve those without throwing away the easily debuggable ASCII-encoded headers; and compression really does solve most of the problem with the inefficiency of ASCII encoding (and you're going to want to use anyhow, since the HTML, CSS, and JavaScript that you're delivering is all a fairly inefficient ASCII representation too).

sergiosgc · on July 10, 2013

Glad you came by, you can surely help me. I'm in dire need of a debugging tool that allows me to start corba requests and works with all large corba vendors.

There isn't one, don't bother looking. Corba is the poster child for the problems with binary protocols: fragmentation, buggy implementations, incompatible extensions.

I'd rather not see HTTP follow the Corba path.

awj · on July 9, 2013

> The same arguments can be applied to HTML, CSS, JSON, RSS and so on.

It can be, but that doesn't really make sense. The vast majority of web development is done without manually editing HTTP headers. It could change to a binary format tomorrow and - so long as our tools preserved their interfaces - we wouldn't even notice. The same cannot be said for any of the other technologies you listed.

romaniv · on July 9, 2013

First, I'm not sure what majority you refer to. Neither do I know what you mean by "manually" editing. I used this text-based function on more than one occasion:

http://php.net/manual/en/function.header.php

This is a good example where adding an object-oriented representation to every header out there would require a lot of work. Not sure if it would justify the gains.

It could change to a binary format tomorrow and - so long as our tools preserved their interfaces - we wouldn't even notice.

Until you try to use grep or something of that sort for some non-trivial analysis operation. Everything 'speaks' ASCII. Custom tools for binary format would take years to evolve to be as powerful as generic text tools.

nitrogen · on July 9, 2013

Custom tools for binary format would take years to evolve to be as powerful as generic text tools.

All you need is one parsing tool that produces a textual representation of the binary protocol, and you can once again use grep and friends.

smsm42 · on July 9, 2013

All you need is one parsing tool ported to 1000 platforms in existence now. We already have ASCII tools on all of those platforms, and we are pretty much guaranteed once new platform is created it would have basic ASCII tools. However it is not at all guaranteed it would have decoder tool for every binary protocol out there. That's why ASCII protocols are easier to handle than binary ones. And for 99.999% of protocol users, savings from converting to binary would not be even measurable. Sure, for likes of Google and Amazon economy of scale would be substantial. But 99.999% of web users aren't humongous-scale projects, they are relatively low-tech projects for which simplicity is much more important than squeezing out every last bit of performance.

nitrogen · on July 9, 2013

So long as nobody that needs to speak ASCII deploys a server that doesn't speak HTTP/1.1, I think switching to a more compact binary protocol for HTTP/2.0 is a good thing. Embedded devices will be able to handle more sessions with less CPU power, for example.

smsm42 · on July 9, 2013

I'm not convinced for average embedded device parsing HTTP headers represents significant amount of energy spent. Are there any data that suggest that for average device - I don't mean Google's specialized routers or any other hardware specifically designed to parse HTTP - this change would produce measurable improvement? In other words, how much longer the battery on my iPad would last? I don't think I'd gain even a single second, but I'd be very interested to see data that suggest otherwise.

nitrogen · on July 9, 2013

I'm talking about things like Philips hue and hardware with a 70MHz CPU, or Arduino even.

awj · on July 9, 2013

> This is a good example where adding an object-oriented representation to every header out there would require a lot of work.

Most of the tools I've used represent the header as a hash/dictionary. I fail to see how that approach "requires a lot of work".

> Until you try to use grep or something of that sort for some non-trivial analysis operation.

You're arguing from the assumption that a binary protocol would be implemented by idiots. Custom binary tools can always emit a textual representation, at which point you can grep through it to your heart's content. This is the exact same problem that we've been solving with compilers for generations. It isn't nearly as insurmountable as you seem to believe.

avolcano · on July 9, 2013

Most people aren't writing HTTP requests by hand in a text editor.

jdp · on July 9, 2013

No, but a lot of people are writing them in telnet prompts and through netcat pipes.

nitrogen · on July 9, 2013

Curl and wget are nearly as ubiquitous as netcat.

jdp · on July 9, 2013

The older, HTTP/1.X compatible versions maybe. What happens after the web upgrades to 2.0? How long until compatible tools make it to default installs? Even now, OSX doesn't ship with wget.

lazyloop · on July 9, 2013

But they watch, capture and post them to mailing lists and stack overflow.

skriticos2 · on July 9, 2013

People are actually writing HTML/CSS/JS, so no. They have a whole other set of issues like being XML based (for HTML), but they do the job and are not likely to experience fundamental change in the next 5 years in broad adoption.

dragonwriter · on July 9, 2013

> They have a whole other set of issues like being XML based (for HTML)

HTML isn't XML based; there is an XML-based relative of HTML (XHTML) which was originally (before HTML5) viewed as a potential successor to HTML, and with HTML5 there is an available XML-based serialization of the HTML's semantics, but HTML is its own thing (prior to HTML 5, HTML was SGML-based; XML was inspired by HTML rather than serving as the basis for it.)

smsm42 · on July 9, 2013

ASCII is inefficient, but nobody cares. For vast majority of Web users, which aren't working on scales of Google and Amazon, difference in performance couldn't even be measured. And for them easiness of use - low barrier of entry, basically all you need is a basic ASCII tool and you're ready to go - is vastly more important than completely immeasurable performance gains from using opaque protocols.

Of course, you can say TCP/IP is still binary, and it is true. But TCP/IP tools are built in every OS in existence now, so they do not form a real entrance barrier. Would HTTP tools be in the same position? I'm not sure - most HTTP tools right now are not standard and do not cover even HTTP/1.1 completely, what reason is there to expect they'd cover the whole 2.0 protocol properly and be widely standard and available on the level tcp/ip tools are? Which means much higher barrier of entry.

zanny · on July 9, 2013

I'd figure the inefficiency cost of ascii vs binary http headers over yottabytes of packets every year would add up. It hurts your bandwidth, it wastes electricity on the wire, and it wastes processing power. An insignificant smidgen on average, but add it up and it would probably be substantial.

That is always my stance on things - if one computer is going to run something, write it in python, make it bleed memory, just make it work. If it is going to run on a million, you have to consider the raw power waste of inefficient programming. If it is going to run on trillions of devices for decades, your choices are few in my mind.

smsm42 · on July 9, 2013

Yottabytes only come into play if everybody switches. But complexity of binary protocols would work against that. So probably only very large sites would implement it - and even for them, is parsing HTTP costs that much?

>>> If it is going to run on trillions of devices for decades, your choices are few in my mind.

The history suggests otherwise - majority of mass-produced software is not written with performance as an ultimate concern. You would find a lot of software written in languages like Python or Java, even though using C or assembly would probably produce better performance. But using C or assembly that software would probably never be produced because its complexity would be harder to manage.

Of course, performance does matter - even writing in Python, you have to worry about performance. But here we effectively see an argument saying "since we have a lot of software in Python, if we switch it to C we'll have massive performance gains". I think it is a wrong line of argument, if we switch to C a lot of this software wouldn't be written. (Note it's not against Python or C - I use both and they both are great in their areas :)

So I guess optimized protocol does have its uses for high-volume websites - but I am concerned its advantages would be offset by its complexity. The designation of it as HTTP/2.0 implies it is the next version of HTTP - but it's rather a rather different thing with different use case. I'd rather have it as a separate protocol for high-traffic websites.

neuroscr · on July 10, 2013

Amen! Efficiency to who's standards or values? It's a financial issue for big Internet but what's the real values cost the rest of the Internet? It's obviously a minimal financial costs, as everyone I know that wants to publish can. Why sacrifice durability, readability, and the original core values of the Internet for saving "big dollars" from "big providers"?

rfugger · on July 9, 2013

The Art of Unix Programming by Eric Raymond makes an excellent argument for textual protocols in Chapter 5:

http://www.catb.org/~esr/writings/taoup/html/textualitychapt...

zobzu · on July 10, 2013

I strongly disagree with "protocols are observed with tools we don't need ASCII".

That's pretty annoying to see this kind of thinking. The reason why everyone codes in JS and uses HTML, CSS is because its ASCII. Its easy to understand, hack, etc. Same reason python is so popular. Even Go, is pretty simple like that. Sure its languages vs protocols, but the reasoning is exactly the same.

And in fact, the comparison works with protocols as well: SMTP, IMAP, HTTP, IRC are EXTREMELY easy to understand and code for. Binary protocols are a huge PITA to code for. The argument that you're going to use a lib or whatever tool just doesn't hold any water. You want to understand what exactly happens.

Thats how everyone learns, etc. I could write my own SMTP, IRC clients when i was 10. I could understand it. It works. No way I could understand fully the documented binary protocols. I tried, and it was just too painful and not fun at all (hey, I was 10).

I'm not certain the added performance of using a binary format and some of the other advantages are really good enough to make the world unable to understand whats going on anymore by just looking at it.

Sure purely technically speaking, it sounds like "binary is the way to go" for pure performance.

But if you think about it, making hacking around that stuff a niche thing is perhaps a much greater loss. Even the reliability of a binary protocol is VERY arguable.

In fact I'll put a last comparison. Shell pipes and ASCII. Many tried to replace them with smart binary protocols, objects, etc. Its cool. Its more powerful. More efficient. At the end of the day tho, a quick hack with regular pipes transferring ascii is just easier to understand and we all use those - not the fancy binary objects.

antitrust · on July 9, 2013

> First, ASCII is inefficient.

Why aren't we using binary file formats anymore?

XML is more easily debugged when the machine tools don't work.

It's also more accessible, more easily created and modified, and thus, more available to a wider range of people than just web design professionals.

brokenparser · on July 9, 2013

A binary protocol is bad for a couple of reasons, too.

First, OSI layer 6 called and it wants its old job back. It sat around connecting layers 5 and 7 peacefully since the dawn of the ARPANET, all while the HTTPbis guys were passing around messages back and forth trying to obsolete it.

Second, you can't type a binary protocol. Yet, you have to somehow make the server work without a client or vice versa (for the initial implementation). That's going to be a lot more difficult. With HTTP, you could literally hook up a teletype to the Internet, let it print incoming requests and type the response back to the user agent. I've done this occasionally on a terminal emulator for debugging purposes.

Third, the Internet is big-endian while most common processors in use today are little-endian. This is going to haunt peoples lives forever because you have to continuously convert between the two and although the conversions are orthogonal, the methods aren't idempotent (as opposed to converting a string to ASCII, or a text buffer to DOS style line endings).

You mention 32-bit identifiers as opposed to a string of digits. This is more error-prone than you think, two's complement isn't the only integer representation out there. Implementations written in C would have to deal with their underlying architecture as the standard allows for 3 different representations (so the compiler wouldn't help you out). Then there's signed and unsigned, either of which might not be available in the implementers programming language. You end up unpacking the identifier by hand, which may end up being slower than just looping through a string. ASCII is hardly an inefficient serialisation format.

Fourth, any fixed-length records are going to be useless at some point in the future. Several versions later the fixed length records are going to either point to an extra set of records tailing the 2.0 records or will simply have a (designated) backwards compatible value for consumption by older peers. With HTTP, we can add a header anywhere in the request or response except for the very top. We can even shuffle them at will without adverse effects on peers.

Fifth, it doesn't make sense to optimise a tiny fraction of the entire HTTP session. Any benefits are too small to be worthwhile and would therefore result in a net-negative to most implementers.

Sixth, you can still make improvements to HTTP without moving to a binary protocol. Not sending the same headers on every single request isn't one of them. HTTP is essentially a stateless protocol and every request could be handled by a different server. You can architect clusters of servers routing incoming requests however you please and satisfy every one of them correctly and efficiently. For starters, you can replace any underlying protocol in the stack with a more cluster-friendly protocol in transit.

Seventh, just because no one is using a given content type in HTTP (I think you were referring to multipart/related) doesn't mean the protocol used to transfer that content is bad. Heck, it's not even part of any HTTP standard.

0xbadcafebee · on July 9, 2013

1. This is not an argument.

2. Amazingly, people have made binary protocols work before, in spite of no preexisting implementation, so it's not impossible. I'm sure we will be able to meet the challenge.

3. Do not try to sell me the endianess issue. I have written multi-arch tcp/ip stacks and i'm not a CS major. Trust me, it will be okay.

4. Yes, and IPv6 address space will someday expire. But not soon. And as many fixed-length frame protocols have done in the past, you leave an "extra frame options" bit to stack more fields on. It's fine.

5. It's really not about optimization at all. It's about common sense. The computer works better when you talk to it in computer-speak, and we gain absolutely nothing by talking to it in English human-speak. The benefits are a net-positive because parsing is easier, because a computer is parsing it, not a human. There is no sane argument that can validly claim that parsing human-readable English is easier for a computer than fixed-length bitstrings. CPUs don't grok ASCII, they grok BINARY.

6. Modern designs for clusters of web applications route by session, not by individual request. You are session-oriented instead of connection-oriented, though in practice it's almost the same thing. And see previous comment on why adding onto HTTP willy-nilly is just a hack.

7. No, just the jgc's re-implementation of multipart is bad, for previously stated reasons.

zmmmmm · on July 9, 2013

If you think binary protocols are so great, you have to then explain why text protocols are winning all over the place. People tried to do binary for a long time before HTTP won. We had all kinds of RPC mechanism - CORBA, DCOM, etc. Even the winning data serialization formats are mostly text (JSON, XML) despite the fact we know it's less efficient. Even where people make binary versions the ones that succeed are direct one-to-one translations of the text (eg: BSON).

In the end, it is formats that people can understand that win the day. You can't just write that off as if it has no value. It plays out in technical ways: all the CORBA implementations ended up having very poor interporability partly because they were hard to debug. Nobody could actually look at a CORBA exchange and see what was wrong with it.

threeseed · on July 9, 2013

It's because developers need to read JSON/XML regularly. They validate the data the are sending to the client, they create test cases, the testers often read them as well, it is sometimes stored them in databases. It's because the format changes so frequently that reading it is important.

HTTP is not comparable because it never really changes. It's a fixed format. And frankly the majority of developers never need to go down to that level anyway.

brokenparser · on July 9, 2013

HTTP is not fixed and is in fact very flexible. And even if it were fixed, that doesn't make it broken and certainly doesn't mean you get to replace it with a binary protocol. They don't compare.

sergiosgc · on July 10, 2013

I'd wager every site sitting behind CDNs or a varnish saw a developer go down to telnetting to port 80 to debug the cache behaviour. If you include frontend developers, sure your majority of developers assertion is true. Select sysops only, and you'll be e surprised.

homeomorphic · on July 9, 2013

Re 4: Only under incredibly optimistic models of the future survival and expansion of our species! The IPv6 address space has about 10^38 addresses. Earth's land surface is about 150*10^18 square meters. So in a future where the planet is so crowded that every person lives on a single square meter and owns 10^6 globally routable gadgets, we'd still need 10^12 Earth sized planets to exhaust IPv6.

(Caveat, I'm back-of-the-enveloping this on my phone about to go to sleep But still!)

brokenparser · on July 9, 2013

Allright, if you're so inclined...

1. It definitely is an argument and in fact my main argument. At least explain how this would be any less valid than your "it's a hack" and (bandwagon) "wave of the future" arguments.

2. My turn to invoke "not an argument". Just because one can simply copy a struct over a socket doesn't mean it's a good idea to do so. Especially in light of the flourishing culture of diversity we have on the Internet.

3. You conveniently choose to ignore one half of the argument, but miss it entirely. The point is not that we can't overcome endianness mismatch, it's that we shouldn't have to. At least not inside layer 7.

4. Except the old records will have to remain there forever. HTTP implementations dropped the Pragma header a long time ago and today we can simply pretend it was never there.

5. When it comes to common sense, ASCII is right there. That's because a protocol on the Internet needs to interoperable with many systems. Sure, all of those systems use binary one way or another. But human operators are still going to have to program those systems and ASCII is a useful representation which enables us to do just that. Furthermore, the draft proposes to encode binary headers in base64 in order to transfer them in an HTTP/1.1 upgrade request. Now you have 3 ways of transferring HTTP headers instead of just one and we'll have to support all of them in any case. This might seem trivial to you, but it's a problem to servers and quite a huge one at that for intermediaries (proxies).

6. Amending HTTP with a new header again is much less a hack than providing a way to switch to a binary protocol and resume communications from there. Your buggy argument doesn't stand, for HTTP is not the car. It's the pavement upon which old buggies can ride along just fine until it's no longer considered safe amongst the faster carriages.

7. Yes, I'm not convinced client-provided request identifiers are the way forward myself. Though I would consider the proposal a better starting point for discussions than the current HTTP/2.0 draft because it leverages existing mechanisms better.

0xbadcafebee · on July 10, 2013

1. I understood your point to be "Well why isn't HTTP layer 6?" or "Why isn't layer 6 used?" which makes no sense as TLS is layer 6, and HTTP (and the web service) is layer 7. They necessitate each other. Simply stating that X and Y are different parts of the OSI model are not arguments toward the format of a protocol in one layer.

2. My argument isn't "just because you can", it's "you can." You seemed to be saying it would be difficult if not impossible. I was saying, no it isn't.

3. Endianess will always be an issue, forever. The only time it will go away is when every architecture picks one format. It's a really simple operation and it's part of how computers expect us to behave due to their nature and design. Hacking around it doesn't make it disappear, nor does it help anything.

4. What old records? Pragma was deprecated in 1.1 yet included anyway for god knows why. There's no reason they should do so again, but if they do, it will exist both in text and non-text versions. This is a non-issue.

5.

  > a protocol on the Internet needs to interoperable with many systems

You mean like IP, ICMP, TCP and UDP?

  > But human operators are still going to have to program those systems

Yep.

  > and ASCII is a useful representation which enables us to do just that

Sure. My C code editor displays ASCII. It totally enables me to write IP, TCP and UDP code, using an ASCII display with code in ASCII. And it neatly compiles down to binary and runs a binary protocol. Amazing!!!! (seriously though, if your argument is that ASCII is just easier to "program" as a protocol, you're up shit creek; you have to write more code to handle converting ASCII to binary and back anyway. your high-level language abstractions hide this fact from you, and you think it's a convenience because you never have to learn what a constant is)

  > it's a problem to servers and quite a huge one at that for intermediaries

That's backwards compatibility for you. If the alternative is to simply mangle and bungle the existing format into a frankenstein into eternity, it's not going to be any better.

6. Are you comparing extending HTTP/1.1 for a single feature to the backwards-compatibility support of HTTP/2.0? Because that makes no sense. The vehicle analogy is just weird at this point.

7. See, this is where the vehicle analogy works again. "leveraging existing mechanisms". In other words, let's throw one more feature on top. It never ends, because all you have to do is keep adding more lines, and modify the browser, and modify either the server or your web app, and keep going to support god knows what. At some point they'll implement an incredibly complex binary protocol and embed it in base64-encoded ASCII HTTP/1.1 headers, because "leveraging existing technologies" is thought of as a neat thing to do. It will also be insane. At some point you need to just make a better <whatever> instead of hacking and hacking and hacking onto it to make it do what you want.

Like building the great pyramid of Giza out of tinker toys. Sure, it's easier for people to use tinker toys. It's easy to understand. You don't have to do any real work. And it's also not meant for that task. At some point you need to throw out the toys and use stone.

I can even go further. ASCII is too old to use. Really, it's been antiquated by UTF-8. It is telegraphic codes for teleprinters. And ASCII itself was micro-optimized to only be 7 bits, and the 8th bit was used as a parity bit because perforated tape had space for 8. ASCII is so antiquated (1960) that nobody should be using it anymore.

Clearly we need to implement HTTP/2.0 in UTF-8 wide characters, so connections to China, Japan and India will support their native language in the protocol. (After all, what's the point of a native-language protocol if only English speakers can read it?) Also, we should include the byte order mark at the beginning of all messages so we don't have to worry about how endianess works.

josteink · on July 10, 2013

> At some point they'll implement an incredibly complex binary protocol

No need to wait: http://tools.ietf.org/id/draft-ietf-httpbis-header-compressi...

Just look at that and weep. That whole document deals with how to represent HTTP headers. It doesn't define them, their behaviour and how they should interact. No. This multi-page document merely documents how these headers should be represented.

You know, things which up until now has been:

    Lines of text with key-value-pairs delimited by a colon-sign.

Noticed how that didn't take eighteen pages and pressumed anything about current generation consumer DSL MTUs? Yeah. That's a nice, simple and good spec.

Obviously this HTTP2 binary monstrosity is being done all in the holy Google-name of micro-optimizing performance.

This is terrible design and quite literally obfuscation more than anything else. I cannot believe the IETF is even considering this junk.

Edit: Link to an IETF discussion on the subject: http://www.w3.org/Search/Mail/Public/search?keywords=&hdr-1-...

brokenparser · on July 10, 2013

1. TLS has nothing to do with this, it's Transport Layer Security (even though you may think of it as layer 6) because it doesn't alter its payload. ASCII/UTF representation and the messages themselves are layered on top of it. By going binary, you may well end up forcing your encoding onto systems which are not native to that encoding. Whereas of right now you could link any two systems and exchange messages, a binary protocol would mean that some systems can exchange messages freely and other systems would see garbage. That's why the Internet was standardised on ASCII and \r\n, so we wouldn't ever have to deal with that again.

2. I don't disagree it's easy to come up with a binary protocol, taking a short cut is always easier. Just like it's easier not to write a test harness with full coverage for a software project, that's entirely up to you. When a regression causes havoc down the road before you realise what's going on, well, rather you than me.

3. You're defending a regression, as of right now it's a non-issue. And are you really calling ASCII a hack around endianness issues? On what planet?

4. Records that are going to be deprecated down the road, which I think is fair to consider inevitable. All I'm saying it's been a problem in binary protocols before, so let's not do that. You don't see this as a problem at all, so I'll digress.

5. You mean like IP, ICMP, TCP and UDP?

Yes, these are built into the operating system. Once you start using e.g. netcat (who hasn't piped tar into netcat for a quick backup?), all of that becomes transparent.

No, my argument isn't that ASCII is ipso facto easier to implement. It's that it's easier to test, debug and always see exactly what's going on over the wire.

If the alternative to a text-based frankenstein format is its binary-bastard child, I'll have the former thank you.

6. No, I'm telling you to think of HTTP as a conveyor rather then a payload. There's a difference, and that's why the vehicle analogy is weird.

7. So you're proposing that every N years we create an entirely new HTTP and upgrade to that? At what point will the streaming pile of upgrade requests yield a noticeable reduction in performance?

Also, UTF-8 characters are not "wide", they're variable length but not wide as in multibyte encodings. Then you go on to suggest we use a BOM at the start of every (UTF-8, mind you) message, I'll leave it up to yourself to let that sink in. You even spelled it out.

drawkbox · on July 9, 2013

I agree 100% on binary being a bad idea in HTTP spec, largely because of encoding and going backwards. But also binary and fixed lengths lead to harder to stream situations, chunked problems and being less approachable which leads to less innovation I believe. I'd argue HTTP clients/servers are better at dealing with buffer overruns because it isn't so set to a fixed length and more based on better defensive content messaging.

Many of the complaints on HTTP are really complaints about MIME messaging which the entire internet is really built on (standards anyways) and has ran pretty smoothly for a very long time. Approaching improving HTTP by addendum like SPDY is a better idea. Or possibly transporting it better over streamed protocols like SCTP: http://tools.ietf.org/html/rfc6525 no need to modify the packaging/messaging format.

MIME/HTTP/HTTPS are very flexible and if you want binary can be added in and has been in multipart, EDI/HTTP/AS2 and other RFCs use this. Multipart isn't used as much because it is more problematic (used heavily in email and custom protocols) so making the whole spec that way would be bad overall. The points on the OSI layers is very key, let's not revert to binary + base64 everything just to get data across the wire. You can put anything in there, basing it in text and human readable is always a good idea. That is really what this whole layer is about. Binary type of movement pushes us back to the days of non standard blobs, problems that http messaging then content as xml then json solved by standardizing readable exchange of data. When you are exchanging data in a standard way it should be very basic to minimize problems not collude. Throwing out all of MIME just to speed up HTTP when other protocols exist for any needs that are faster (real-time, attaching files, streaming etc) is a bad idea. Also changing support from HTTP < 1.0, to 1.0 to 1.1 had many problems, unless this adds considerable benefit, changing it adds more problems.

bitwize · on July 9, 2013

Second, you can't type a binary protocol. Yet, you have to somehow make the server work without a client or vice versa (for the initial implementation). That's going to be a lot more difficult. With HTTP, you could literally hook up a teletype to the Internet, let it print incoming requests and type the response back to the user agent. I've done this occasionally on a terminal emulator for debugging purposes.

That's a very 1970s way to develop a new protocol.

These days you specify the protocol in, say, an XML- or JSON-based file format, and then run a code generator to produce client and server libraries directly from the spec. This has the advantage that the implementation is derived directly from the specification, so there is little room for ambiguity.

Wayland is one example in the open-source world of where this is done, but I've seen the technique used in proprietary shops as well.

brokenparser · on July 9, 2013

That's useful for RPC type protocols, but HTTP isn't RPC based and it has lots of semantics written in English. I think it's better that way because it allows for a greater variety of use cases and implementations. You can still do RPC with websockets, if that's what you want.

bitwize · on July 9, 2013

I've seen this done with the on-the-wire protocol of scientific measuring equipment. It's hardly just for RPC (which HTTP increasingly resembles anyway).

The point is, the ability to "type" a protocol is irrelevant to how modern distributed software gets developed. Maybe it mattered in the days when comms were at 300 baud, machines had kilobytes or perhaps megabytes of core, and the Mark I eyeball was the best way to debug machine-to-machine comms, but these days we have tools that can decipher binary wire protocols for us. Performance and adaptability are far more important now than human readability. That war has been lost.

lttlrck · on July 10, 2013

By these days you mean the last 30 years? ASN.1 was defined in 1984, so actually its probably closer to 40...

neuroscr · on July 10, 2013

Real tools? I would argue that there are way more real tools for debugging ASCII-based protocols than there will ever be for binary-based one. ASCII-based protocol is highly compose-able by allows us to separate intent (contents of the message) from ever-evolving encodings (rebasing, encryption, compression, etc). In this world of fighting complexity, why are we not favoring the simple?

kragen · on July 9, 2013

> Fourth, fixed-length records are the wave of the future!

1960s-style arbitrary field size limitations: the wave of the future! No doubt any day now we'll reorganize the internet around shipping punched-card images around, too. We could call the project the "Because It's Time Network", or BITNET.

> Of course, most people would argue that you don't save enough time or bandwidth or cpu cycles to make a difference, but it would make for easier-to-write parsers.

Most people would argue that, yes; famously, Kernighan and Plauger did argue that in The Elements of Programming Style in 1974. By "easier-to-write parsers", you mean easier than dict(((key.lower(), value.lstrip()) for key, value in (line.split(':', 1) for line in header.split('\r\n'))))? Because I think that's going to be a pretty tough bar to fit under. (Yeah, I know you need another couple of lines of code if you're going to handle indented continuation lines, but you can get rid of those without returning your protocol design to the Summer of Love. You could also get rid of the .lower() and the .lstrip() while you're at it.)

> Add to this that every header could have a 32-bit identifier (4 bytes)

Padding out sub-byte-sized values to fill out fixed-width fields: the Intelligent Man's Approach to Saving Bandwidth! Or you could just use one-letter names in an ASCII protocol.

> [Fixed-length fields] solves crazy problems like header injection and request integrity checking.

Clearly we've never had parsing bugs in binary protocols full of fixed-width fields, now have we? Surely not bugs that produced security holes? Except maybe TCP, IP, and DNS. And X.400, and X.500, and X.509, and some of those were the fault of ASN.1 BER and DER, which are hardly fixed-width formats. And surely silently truncating a value to put it into a fixed-width field would never change its semantics, right?

> as to a general question "why use binary when we can compress stuff", what is smaller: a string of bits, or a compressed string of digits?

Well, let's see. How many files do I have here?

    $ find | wc
       5946   17074  330147

Each one has a ctime, an mtime, an atime, and an inode number. The ctime is generally going to be the same as the mtime in this case, so we'll leave it out. I think they're technically 64-bit values in the current inode structure, but let's count them as 32 bits instead, since none of my files are from after 2038. The inode number is also 32 bits. So if we take these three 32-bit values per file, we have 5946×12 bytes, or 71352 bytes. And if we print them out as digits and compress them?

    $ find -printf %.10A@%.10T@%i | gzip -9c | wc -c
    46278

But wait, that string of digits isn't parsable; it's just digits. So let's add delimiters.

    $ find -printf %.10A@\ %.10T@\ %i\\n | head -3
    1373396230 1373338685 2359306
    1372967497 1372967489 5246218
    1365458166 1365458157 5248264
    $ find -printf %.10A@\ %.10T@\ %i\\n | gzip -9c | wc -c
    47506

So a compressed string of digits is a lot smaller in this case. But you could argue that that's just because my data is highly redundant, since most of the timestamps are going to be within the current few years, which is true. But then, most data is highly redundant. How bad can it get, in the worst case of representing uniformly distributed random 32-bit values as compressed strings of digits, with spaces between? It adds about 44% overhead:

    $ dd </dev/urandom bs=1024 count=1 | 
      od -w1024 -l | tr -s ' ' ' ' |
      gzip -9c | wc -c
    1+0 records in
    1+0 records out
    1024 bytes (1.0 kB) copied, 0.000622635 s, 1.6 MB/s
    1460

> Little inefficiencies like this don't go away if you just hack onto the same old protocol forever.

But you know what? They do go away if you tunnel over a deflate-compressed VPN, or if you increase the available bandwidth by a few percent. Maybe instead of trying to take us back to bug-prone 1960s designs you should be working on that.

0xbadcafebee · on July 9, 2013

1. Sarcasm meeting sarcasm; HN truly has turned into hell.

2. Trying to fit an HTTP field parser into one line is not the way to win a programming argument.

3. One ASCII character names? 255 (edit: 37 alphanumeric) possible headers, of which only a few might correlate to what you'd expect? You're trolling me now.

4. Yes, software bugs happen! It's crazy I know. But let's go ahead and assume that the security holes that still plague applications today due to design flaws are not the same as a couple off-by-one bugs decades ago.

5. By virtue of the algorithm, compression works better the more you have of the same thing. You won't have 70KB of headers to compress at once; more like 400 bytes. The compression of individual header groups each time will not benefit from the previous data's compression, as TLS or SPDY might do. The eventual overhead would not only be larger than a bitstream but take more CPU to decode.

6. Not only are they inefficient, they add complication to the parsing of the protocol, which is one more thing an application can mess up. Not only is it slower, it's more prone to errors. A VPN does not fix that. Perhaps you should focus on real solutions to real problems versus quoting what decade network application programming began in.

kragen · on July 10, 2013

> One ASCII character names? 255 (edit: 37 alphanumeric) possible headers, of which only a few might correlate to what you'd expect?

I wasn't suggesting encoding "Host: news.ycombinator.com\r\n" as "Hnews.ycombinator.com\r\n" but as "H:news.ycombinator.com\r\n". As long as you keep the colon, you can still use long names for other headers.

> Trying to fit an HTTP field parser into one line is not the way to win a programming argument.

You said parsing would be "simpler". It's going to be hard to get simpler than something that you can fit into one line.

> Perhaps you should focus on real solutions to real problems versus quoting what decade network application programming began in.

Well, that's kind of what I'm saying: let's focus on solving real problems, instead of recreating new ones that we'd already solved decades ago, in order to "solve" non-problems like HTTP header encoding.

0xbadcafebee · on July 10, 2013

So you'd rather an illegible ASCII representation instead of an illegible binary representation. This is why I hate getting into these arguments; people will insist on completely illogical nonsense as far as they can take it.

> It's going to be hard to get simpler than something that you can fit into one line.

This is a terrible argument, as you can fit anything onto one line if you string it along enough. But here's one example of something simpler:

strncpy( frame_struct, buffer, sizeof(frame_struct) );

And i'm not proposing we merely solve the problems of HTTP. That would make too much sense; people are much more willing to put up with bullshit than do the hard work to make things work correctly. I was proposing we make things work better, simpler, and more reliable, and throw away the completely nonsense argument that using netcat once a year is worth a ridiculously impractical protocol format. But whatever, it's not like this thread will amount to a hill of beans.

So let's just convert all of HTTP into XML.

kragen · on July 10, 2013

Your actual parser there is the definition of frame_struct, which you left out; and, as others have pointed out, if you're putting ints in there, you need to ntohl them. Also, you probably need some kind of extensibility.

And I don't really think "H:news.ycombinator.com" is quite as illegible as your suggested 32-bit integer space — which, by the way, is small enough that you'll probably need a central registry to prevent header name conflicts — and it also occupies only two bytes instead of four for the header type. So, from my point of view, the "completely illogical" thing is to go from, "The header names currently in HTTP are too long!" to "Therefore let's replace them with 32-bit integers in a binary protocol" instead of "Therefore let's shorten the header names in HTTP", which solves the problem more thoroughly and with less collateral damage.

And what is this about "if you string it along enough"? We're talking about a parser (for RFC-822 headers without continuation lines) that fits into 110 characters, here, without the least obfuscation. Less than a Tweet. In fact, I just Tweeted it. And it worked on the first try.

> the completely nonsense argument that using netcat once a year is worth a ridiculously impractical protocol format.

You know, we did kind of try binary protocols already: the whole IPX stack, CIFS, X.everything, SNMP, TFTP, ICB, Sun RPC and thus NFS and NIS, and so on. A few survive in common use: DNS, TCP, IP, ICMP, SSL, SSH, BGP, and to some extent, SNMP. And there are lots of them working fine inside of particular companies, rather than between implementations by different vendors. But for the most part, they've been replaced with textual protocols, despite the lower efficiency and in many cases the first-mover advantage: HTTP, SMTP, and IRC, and previously FTP, Gopher, and Finger. You seem to be arguing that was an accident, or a mistake. It's not.

lttlrck · on July 10, 2013

SIP is inspired by HTTP and has defined compact headers like this for over a decade [0]. It is hardly indecipherable.

Why do you find HTTP impractical? There seems to be a lot of evidence to the contrary.

[0] http://www.ietf.org/rfc/rfc3261.txt

josteink · on July 10, 2013

> But you know what? They do go away if you tunnel over a deflate-compressed VPN, or if you increase the available bandwidth by a few percent. Maybe instead of trying to take us back to bug-prone 1960s designs you should be working on that.

This was mentioned once in the IETF discussion, before someone said "but uhm, SPDY is binary, and we have data from SPDY, and yeah".

After that everyone was too busy discussing how horrible unstrict and ambigious text-formats can be, before jumping off to a 20 email discussion about which endianess should be preferred and how the clients and server should determine which one to use, or maybe clients should support both kinds endianess.

All without a hint of irony. It's like a Bizarro world IETF discussion.

stiff · on July 9, 2013

what is smaller: a string of bits, or a compressed string of digits?

What are you talking about? You could have had something on your mind, but this is a terrible terrible juxtaposition of words...

voidlogic · on July 9, 2013

>I would have liked to see an ascii based protocol

I disagree I think HTTP should have been a simple binary protocol from the start and HTML should have required compilation into a binary format.

How much work would have it really been? htmlcmp foo.html providing foo.bhtml. No whitespace, No end tags, one or two byte tags, etc. Strictness in the reference HTML compiler implementation could have saved the web from all the stuff outside the actual standard that browsers (and other tooling) now have to support (so they don't "break" the web).

I'm not suggesting anything as crazy as the flash binary format (I wrote a Java flash player once...), but when I started to write things like proxy servers and HTML minifers I was blown away by the extreme inefficiency of HTTP/HTML.

This is a step in the "right" direction IMHO.

samwillis · on July 9, 2013

I believe that the "world wide web" took off for two reasons. It was completely free and it was incredibly easy for ANYONE to make a website. If you needed to "compile" all HTML it would have discouraged a lot of casual experimentation. Not everyone understands what a compiler is.

Just as an example I started playing around with HTML when I was about 12 (in 1998), it was easy and I got instant results. A year or so later I tried to learn Perl and quickly gave up because I couldn't get my first script to run. It was another year before I tried to "program" again and became hooked.

HTML and HTTP were successful because they were incredibly easy to understand. I wish we could keep it that way.

wtbob · on July 9, 2013

Wouldn't it have been cool if there were an efficient binary format and a nice human-readable and human-editable format, with a well-defined transformation from one to the other?

Back in the 1990s, Ron Rivest came up with canonical S-expressions, which are fully capable of representing the same information represented by HTML, XML, ASN.1—but can be either human-readable or binary.

Here's a simple example: (p (* class (footer x-treme)) "This is a " (b "footer") "."). Very human-readable, very human-editable, and easily machine-readable, wouldn't you agree?

As a binary format, it would be (1:p(1:*5:class(6:footer7:x-treme))10:This is a (1:b6:footer)1:.). Still geek-manipulable, if necessary, and extraordinarily simple to parse. And it has the advantage that it is a distinguished encoding--any of the myriad human-readable encodings all reduce to the same canonical encoding, which has advantages for hashing.

ajanuary · on July 9, 2013

I'd argue the big thing that made HTML so easy to play with was permissiveness, not the lack of explicit compilation.

Perl doesn't tend to have an explicit compilation step. You could write a HTML compiler to be as permissive as HTML parsers and you wouldn't have the same frustration you had trying to get started with perl.

0xbadcafebee · on July 9, 2013

Gopher was simpler to understand than HTTP/HTML. But it wasn't as flexible, and didn't have as nice support for pretty pictures and midi files.

If anything, HTTP/REST was much more complex. HTML was somewhat mystic yet allowed you to move things around easier, and actually design things vs just presenting them. And hyperlinks were really cool.

http://www.ils.unc.edu/callee/gopherpaper.htm

voidlogic · on July 9, 2013

>HTML and HTTP were successful because they were incredibly easy to understand. I wish we could keep it that way.

The web is big enough today that we can afford to make it more efficient since we have so many professionals that don't require the kind of implicit hand-holding provided by the original implementations. Back then everyone was a newbie and there were no web professionals.

samwillis · on July 9, 2013

I think we need to be careful about claiming its more efficient. There seemed to be significant disagreement about how much more efficient SPDY was last year from the various different benchmarks. From memory the consensus seemed to be that its only 5-10% quicker. Does anyone have any more up to day benchmarks?

I hate the idea that you would have to be a "web profesional" in order to get started today. We should embrace and keep the culture the web was built on.

MrDOS · on July 9, 2013

That's an incredibly good way to lock out new newbies. I don't know how many people I know who got into programming through web development. Just because existing web developers don't necessarily need a simple solution doesn't mean new web developers couldn't take advantage of it.

arh68 · on July 10, 2013

For what it's worth, I started messing with HTML in elementary school, but had no idea how simple HTTP/1.0 was until I hit the 400 level classes in college. Editing in Notepad was the only way to go back then: even Dreamweaver got too complicated. Likewise, my first foray into JMS/message-queue type stuff was with STOMP, the simple text-oriented message protocol: there was no way I was going to understand the 'open wire protocol' for ActiveMQ. Ain't got time for that!

All that said, I think peterwwillis is right: http/X.0, as long as it's well-simplified, is better in binary than it is in text. Ideally, there's a bijection between your text-mode and binary-mode (like a lens), where it's easy to parse (rely on your toolset to do the translation back and forth) and easy to put on the wire. Forth is a good example of how to do it sanely.

voidlogic · on July 9, 2013

Really? I can't imagine anything they might do putting this out of the reach of say a sophomore C-S student or a motivated independent learner.

nitrogen · on July 9, 2013

A lot of current professionals started out as 11- and 12-year-olds dodging COPPA to host their first HTML web sites, not CS sophomores.

voidlogic · on July 9, 2013

>A lot of current professionals started out as 11- and 12-year-olds dodging COPPA to host their first HTML web sites, not CS sophomores.

I would put those 11 and 12 year old down under "motivated independent learner"

If your concern is the a binary HTTP, too bad, it already here, its called HTTPS. It does seem to be holding anyone back.

If your concern is a binary HTML/compiler for it, I don't by that either. I was using a compiler when I was 8 years old and didn't have a problem with it conceptually.

jimktrains2 · on July 10, 2013

HTTPS is HTTP over a TLS connection, not a binary version of HTTP.

voidlogic · on July 10, 2013

Obviously, but hits take very helpful to look at HTTPS over something like wireshark. That was my point.

steveklabnik · on July 9, 2013

Yesterday, I showed a classroom full of high school kids HTML and CSS. They were so excited to be building stuff, and thought it was so neat that the text would turn into a web page.

Just because we have more 'professionals' today doesn't mean everyone is a professional.

drawkbox · on July 9, 2013

I think to get maximum innovation on anything technology related, it should always be simple enough to get started at any level, then as you gain more experience you can (and the specs allow you to now -- AS2/EDIHTTP spec for one has encryption, compression, on top of HTTP/HTTPS - http://www.ietf.org/rfc/rfc4130) do more for performance, optimization etc otherwise it is premature optimization and a bit of a wall.

Like with gaming, each game should be simple to start but deeper to master. That is what this layer is all about, lower down the OSI stack it is much more of a wall to beginners. Never lock out beginners as they can be better masters with time, don't hide the entrance to the labyrinth. Leave things as approachable, but with better professional experience, modifiable to perform better. We have all that now, a competing binary protocol will never see the innovation that a more simple one like HTTP will. Higher up the stack you can see how this was better for exchanging data in standard ways from old school binary blobs, to CSV files, to XML files, to JSON. Same reason REST services won over SOAP, simplify... There is a JSON binary format BSON in mongo and also MessagePack but guess which one is used more in service/exchanging data the textual one or the binary ones? The binary formats work well in certain situations where both endpoints are controlled by the same entity.

Binary and more locked down/optimized formats and messaging have their place but the start/base should always focus on simplicity over optimization.

The general rule in exchanging via standards is be liberal in what you accept, conservative in what you send. Being all binary all the time is a backwards step and is conservative on what is accepted, I also think it would lead to a host of difficult to debug problems just based on work I did with AS2/HTTP RFC implementation one of them being streaming and of course encoding/decoding which can fill hours of work if you can't visually see the content at some level.

cenhyperion · on July 9, 2013

But I don't think we should be raising the barrier to entry and making it more difficult for a young hacker to get started. Like the above commenter I started hacking around with HTML and CSS when I was around 12 and it led me into linux/python/php/web frameworks.

There's something beautiful about being able to teach someone how to write a basic HTML document in 20 minutes. My mother can easily understand what HTML is doing, I doubt she'd understand a compiler.

ANTSANTS · on July 9, 2013

I see where you're coming from, I had a similar experience playing with HTML as a kid. That said, "compiling" HTML from text to an efficient binary format wouldn't have prevented browsers from including those compilers themselves and transparently compiling all plain text pages received. Newbies could keep on serving plain text HTML (until someone chastises them for it) while the web as a whole would benefit from the increased efficiency that comes from binary pages being the norm.

Your argument doesn't really apply at all to HTTP, though. No one "got their start" peeking at HTTP requests. It's solely the domain of those working on infrastructure (for some definition of that word). Anyone with any idea of what HTTP really is (ie more than "that thing at the start of a URL") should have no problem using a tool to convert between a binary and text representation of the protocol. It's not like you can just magically pull HTTP requests out of the ether, you need a tool anyway. There's no reason why curl (for example) couldn't transform a request you write into an equivalent binary protocol, or why it couldn't do the inverse operation when it receives the response. It's utterly ridiculous to me that there is such inefficiency in HTTP just to make things slightly easier on the implementers of curl and wireshark.

weavejester · on July 9, 2013

HTTP is a protocol designed to communicate information between two machines. There's no reason it should be human-readable. Trying to make a protocol that's easy for humans to read and write leads invariably means making it harder to write software for.

dspeyer · on July 9, 2013

A while back, I wrote a daemon that checked a bunch of network stuff in a loop. I needed a UI for it, so I made it speak http. No library, just raw GET support. It worked. Didn't take long to write, either. I would never have tried that with a binary protocol.

nitrogen · on July 9, 2013

I don't think anyone's saying HTTP/1.1 should go away.

weavejester · on July 9, 2013

You say that like you think a custom binary protocol would be harder than writing a HTTP server, even a simple one, from scratch.

dspeyer · on July 9, 2013

A custom binary protocol? Then I would have to write a client! And install it everywhere. And keep it updated. And design my gui in wingdi instead of html.

That is not a viable option.

weavejester · on July 9, 2013

I thought you said you weren't using existing libraries? Or did you just mean for the server? You made it sound as if you thought writing a server for a custom binary protocol was harder than writing a HTTP server from scratch.

If you're aiming for an existing ecosystem, then sure, there's no reason not to use HTTP, assuming you make use of established libraries. But widespread use is HTTP's only real virtue; the protocol is considerably more difficult to implement correctly than it should be.