This working group has failed

lazyjones · on Jan 7, 2014

Specs designed by committee usually fail when they produce just the "paper" and no reference implementation. Look at W3C, which consistently failed to develop their own reference browser. The same seems true for TLS, actual implementations are too complicated because the committee paid no attention to this. A positive example is MP3, where everyone just copied the reference codec in the beginning.

octo_t · on Jan 7, 2014

A good example here would be C++. Look at how long it took for C++11 to be implemented, but now at least two compilers are feature complete (gcc + clang) and now they're on schedule to be feature complete for C++1y[1], with implementation + specification in sync.

[1] http://clang.llvm.org/cxx_status.html#cxx14

MichaelGG · on Jan 7, 2014

It also helps that C++ doesn't have nearly the number of implementers in position to screw stuff up. Unlike any number of webservers, SSL libraries, load-balancers, firewalls, proxies, etc.

How many C++11 compilers are there? This support matrix[1] seems to indicate there are 2 compilers in the lead, with 5 providing fairly decent coverage. How popular is XLC++?

And anyways, adding another compiler is something you choose to do, and usually run a bit of tests for, right? Unlike networking protocols where it's invisible to you that someone has some idiotic inspection hardware that segfaults on certain packets and drops your connection.

Edit: Also, even if two compilers are feature complete, it doesn't mean they implement the features in perfectly compatible ways, right?

1: https://wiki.apache.org/stdcxx/C%2B%2B0xCompilerSupport

Sharlin · on Jan 7, 2014

And C++11 itself was implemented awfully quickly compared to C++03, exactly because this time around the Committee was much stricter about having reference (or at the very least prototype) implementations available before accepting anything into the Standard. As you mentioned, C++14 actually seems to be proceeding like it optimally should, so the Committee has certainly learned its lessons regarding this.

rjzzleep · on Jan 7, 2014

websockets at the very least had a reference implementation. but it was more along the lines of "it works for me, i tested it on my local machine"

it took ages long discussions to get the point across. "yes, but on your local machine you don't have a reverse proxy like every other normal web application stack". it's going to break if we do it this way and they just kept responding.

"but it works here"

what about the oauth2 spectactle?

http://hueniverse.com/2012/07/oauth-2-0-and-the-road-to-hell...

gsnedders · on Jan 7, 2014

At the W3C, no spec can reach REC without having two interoperable implementations.

Also note everyone copying the reference implementation isn't good either — there's no guarantee the spec actually defines enough to recreate the reference implementation then!

lazyjones · on Jan 7, 2014

> At the W3C, no spec can reach REC without having two interoperable implementations.

Wishful thinking? Where are these reference implementations documented for HTML 2.0, 3.2, 4.01?

> Also note everyone copying the reference implementation isn't good either — there's no guarantee the spec actually defines enough to recreate the reference implementation then!

If the spec is lacking, whatever the reference implementation does completes the spec. There is no value in having a pure paper spec that someone may be able to implement a correct piece of software with over having a correct reference implementation with full source code.

gsnedders · on Jan 7, 2014

> Wishful thinking? Where are these reference implementations documented for HTML 2.0, 3.2, 4.01?

There is no requirement for them to be "reference implementations". CSS 2.1, for example, used various web browsers as proof of interoperability (where every test in the test suite was shown to have at least two passing implementations). The "interoperable implementations" clause was introduced as a recommendation in 1999, 2001 saw it required every feature in the spec must have been implemented, preferably with two interoperable implementations, and clarified as RFC2119 "should" in 2003 — HTML 4 was published in 1998, prior the W3C having a formal process document, and HTML 4.01 was likely exempted as a minor revision in 1999.

> If the spec is lacking, whatever the reference implementation does completes the spec. There is no value in having a pure paper spec that someone may be able to implement a correct piece of software with over having a correct reference implementation with full source code.

Ideally the spec shouldn't be lacking. If one needs a reference implementation, the spec has failed to be a complete definition, and if the spec is not a complete definition, what is the point of it? You may as well just have a high-level overview in the documentation of the reference implementation!

st0neage · on Jan 8, 2014

And you are citing this as an example of how to do it right? Why?

gsnedders · on Jan 8, 2014

> And you are citing this as an example of how to do it right? Why?

Because reference implementations are a bad form of spec. (Note I'm only arguing against normative reference implementations — purely informative ones aren't really a "reference" any more, they're just another implementation, unique in no way except perhaps being the first.)

Why? Because:

- If any disagreements between spec and reference implementation are settled by the reference implementation being right, then there is no motivation to write anything except, "I like unicons". Because, after all, the reference implementation says how it is actually done.

- Similarly, if the reference implementation is treated as being right in case of contradiction, if it has a null-pointer dereference bug leading to a SIGSEGV (per the de-facto standard!), every third-party implementation must SIGSEGV in the same case — because, after all, that's what the spec says you must do!

- Reverse-engineering an implementation is often far harder than understanding a spec, making an independent implementation more difficult to create (if you just want the reference implementation used everywhere, why bother standardizing it?), as one has to get the abstract model out of the reference implementation (which may be implicit and not stated anywhere). This is especially relevant if one wishes to make an implementation using a different model to the reference (e.g., a parallel implementation of something with a sequential reference implementation).

- If everyone uses the reference implementation, the spec (which, in that case, is the reference implementation) is less likely to have so many eye-balls looking over it (looking for inconsistencies, unintended behaviours, ambiguities, and outright bugs) than if it was implemented by multiple, independent people/teams.

Having a spec developed with multiple independent (non-normative) implementations:

- Leads to a clearer spec, having had to be read unambiguously by multiple implementers;

- Leads to greater peer review of implementations, as with a reference implementation it can easily become the only implementation of the spec, thus having to only interoperate with itself, whereas requiring multiple independent implementations requires that they must be able to interoperate; and

- Leads to higher quality implementations, as they are required to match the spec (an external, normative, spec places constraints on the implementations, and therefore leads to better review of the implementations as well as the spec!).

One certainly doesn't want a spec to developed in isolation from any implementation — but one doesn't need a normative reference implementation to garner the advantages of being developed in tandem with one. Multiple independent implementations are needed to ensure multiple people can understand the spec without having to resort to guessing the author's intention (likely differently!).

Ambiguity tends not to be a major issue in a well-written spec, and while certainly a formal spec (such as a reference implementation) does define all cases, it will also define formally all bugs that the author wrote because they are a fallible human. Bugs in specs will occur no matter how formally they are stated and they are a far bigger issue than ambiguity. The important part is to get as much peer review as possible of the spec to maximize the chance the bugs are found — and if you have multiple implementers all reading the spec while implementing it, they are more likely to catch the bugs than someone just taking the reference implementation and using that.

colanderman · on Jan 7, 2014

Look at W3C, which consistently failed to develop their own reference browser.

Does http://www.w3.org/Amaya/ not count?

robin_reala · on Jan 7, 2014

It exists, but it’s hardly a reference for the current state of the specifications.

MichaelGG · on Jan 7, 2014

Here's an interesting post about TLS compatibility[1]. I guess it explains why no browsers have had TLS 1.2 on by default for such a long time.

" To add to this discussion about protocol version intolerance, I've been tracking this problem in my SSL Pulse data set (SSL servers from the Alexa top 1 million).

Here's what I have for November:

  Total servers: 163,587

  TLS 1.0 intolerance        9
  TLS 1.1 intolerance    1,388
  TLS 1.2 intolerance    1,448 (~ 0.9%)
  TLS 1.3 intolerance   17,840 (~10.9%)
  TLS 2.98 intolerance 122,698 (~75.0%)

  Long handshake intolerance: 4,795 (~2.9%)

"

1: https://www.ietf.org/mail-archive/web/tls/current/msg10657.h...

bradleyjg · on Jan 7, 2014

There's a trade-off to be made. On the one hand browser users would like every page to "just work". On the other hand they want secure connections to actually be secure.

If those .9% of websites break in the latest versions of firefox, chrome, and IE they are more likely to be fixed than if they are coddled through some workaround or even worse by holding back general progress. The former is better for web security. On the other hand, people who want to go to those websites in the meantime will be inconvenienced.

Perhaps a compromise is to build the workaround, but put in an interstitial scare screen. That might generate the desirable social pressure on the website owner without making it impossible to visit.

gsnedders · on Jan 7, 2014

> On the other hand, people who want to go to those websites in the meantime will be inconvenienced.

…and they may well move to another browser which doesn't support TLS/1.2, or stay on an out-of-date (insecure!) version of the browser, which doesn't help web security either.

y0ghur7_xxx · on Jan 7, 2014

Looks like it's down.

Cache: http://webcache.googleusercontent.com/search?q=cache:https:/...

makomk · on Jan 7, 2014

Given recent revelations, one has to wonder if the working group merely failed by itself or was given a substantial nudge in that direction by someone who wanted TLS to be insecure.

MichaelGG · on Jan 7, 2014

Apart from specific contributions by the NSA (if any?), is it hard to believe people screwed up? Look at HTML standards and what a mess that is. Hell, look at HTTP and the insane stuff in that spec.

I'm guessing spec writing is much harder than one might assume, especially if you're not writing elegant code to implement the protocol at the same time. Committees only make things worse, and the IETF RFC format doesn't help either. (Reading the SCSI specs, by comparison, with their nice graphical diagrams, is much cleaner - not that a simple RFC can't convey the same info, but nice diagrams do really help.)

Extensibility is something people get wrong all the time - the TLS 1.2 issue seems to be that enabling TLS 1.2 ends up breaking lots of users.

Without someone familiar with the subject pointing out exact contributions (I'd assume repeated issues raised by the same group of people), normal incompetence is more than enough to explain things, isn't it?

gsnedders · on Jan 7, 2014

Spec writing is only half the challenge (and arguably a relatively easy one!).

A big problem with HTML and HTTP is the fact that error handling has always been undefined (the current HTML spec is the first real attempt to define error handling!) — so different people implement different things, and then when you end up with a single implementation with ~90% marketshare, everyone has to reverse-engineer that (and probably not perfectly!). Combine this with the fact that the majority of implementers of HTML and (to a only slightly less extent) HTTP are web developers who just want to get stuff working — they've never read a page of the spec in their life. So, guess what, they end up relying upon client's error-handling (typically the intersection of clients they care about — which when IE had 90%+ marketshare was often just IE, then see above about reverse-engineering).

The other big problem is the lack of generic test suites for the standards — far too often each implementer ends up writing their own tests, and then not sharing them (the IETF still has no real general infrastructure for hosting testsuites for RFCs!). At least around the W3C, there's been a relatively large movement in the past three years or thereabouts towards developing shared testsuites, in large part down to various Microsoft and Opera people (myself included, in the early days, as a disclaimer) and several WG chairs (trying to push specs to REC — which nowadays requires two interoperable implementations, and hence practically a testsuite), so things are at least slowly changing there. But there's still a lot of work to be done — and it's one area where improvements can have large effects, as it increases the consistency of all implementations, and makes it easier for smaller, and newer, competitors to enter the market.

vacri · on Jan 7, 2014

Following the thread, the TLS 1.2 spec was completed in 2008, but it wasn't supported in OpenSSL until mid-2012 - so anything that depends on OpenSSL had to wait until at least then, then go through the implentation and reshipping, then trickle on down to the end vendors. And with no-one using TLS 1.2 or having a need for it because it wasn't available, it was back-burnered by the browsers.

The follow-up comments paint a much fuller picture of why things are delayed, where the failures are, and what's going on.

makomk · on Jan 7, 2014

If you read the follow-ups, it's not just that no-one used TLS 1.2 because it wasn't available, enabling support for it actually broke stuff for end users[1] - and the browser developers knew this was almost certain to be the case even without looking, because it almost always is.

Also, as the linked email points out, we shouldn't have needed TLS 1.2 in the first place in order to be secure. It was already known at the time TLS 1.0 was designed that they were doing things in ways likely to be insecure for no good reason, but they did it anyway.

[1] https://www.ietf.org/mail-archive/web/tls/current/msg10614.h...

fulafel · on Jan 7, 2014

Buggy Middleboxes broke. They're increasingly preventing deployment of new protocols and apps on the internet.

parennoob · on Jan 7, 2014

Maybe making things more intelligible would help instead of using language that is extremely obfuscated and confusing, and unaccompanied by any actual mathematics?

Take this sentence from the email for instance:

"Even AES-GCM got screwed up: nonces should be counters, but all implementations make them random, introducing an artificial birthday bound issue due to truncation in the standard."

I have no idea WTF this means, but let's go over it:

nonce: I know this is a randomly generated number that can be only used once -- now why should it be a counter? No idea.

"but all implementations make them random": wait, aren't they supposed to be random by definition? According to the above line though, they are supposed to be random. Damn, what I knew must be wrong. I wonder if this person on the internet has submitted some sort of explanation about this somewhere.

'artificial birthday bound issue': Assuming this refers to the birthday attack (http://en.wikipedia.org/wiki/Birthday_attack). Why is it "artificial"? Can we see some mathematical proofs attached please? I sort of get the idea here -- because the nonce is random, it is vulnerable to being recreated after a certain number of attempts, but there is nothing concrete attached here. Or I could be totally wrong in this interpretation. God knows, and maybe this chap.

"...due to truncation in the standard." -- Do you mean some sort of mathematical truncation, i.e. "my number was truncated to 16 bits", or truncation of the standard itself "the last section of the standard was removed"? Please be clear.

Same goes for most things related to crypto -- if you want stuff like TLS to be examined by more eyeballs and find more bugs, you have to first try and make it more accessible. The sentences above are, in my opinion, a complete communication failure.

bodyfour · on Jan 7, 2014

If you choose a random nonce then it follows that a nonce could be randomly reused. If there are N possible nonces this will happen on average after approximately sqrt(N) packets. If you use a counter as a nonce it will only repeat after N packets, no matter what.

For some algorithms a simple incremented value is all that's needed "1, 2, 3, ..." but this means an attacker seeing only two packets can at least estimate how quickly packets are being sent. However, if you encrypt this stream of incrementing numbers with a constant symmetric key you get the best of both worlds: a nonce stream that looks random but is guaranteed not to repeat until after each possible value has been used. Usually when crypto people talk about a "counter" this is the technique they're referring to.

parennoob · on Jan 7, 2014

Thanks! Perfect explanation of the 'counter' term, makes sense. :) Upvoted.

michaelt · on Jan 7, 2014

One weakness of linking directly to posts on specialist mailing lists is they sometimes use specialist terminology. You'll note he doesn't define 'TLS' or 'CRIME' or 'AES' either :)

jheriko · on Jan 7, 2014

i know little but i strongly suspect the artificialness of the birthday bound is specific to this situation. the bound shouldn't be there, but by using random numbers it is introduces - its 'artificial' because an idealised implementation would not suffer that problem.

you are right though, excessive jargon is a massive blocker for anything not just because its unintelligible but also because of the 'elitism' social signal it sends...

many unspectacular people can find holes in a cryptographic system - in many cases common sense or a little ingenuity is enough - but they will generally not know the term for their specific flavour of attack (attack is a term itself) or the surrounding terminology to describe it in the context of cryptography.

in short, its not complicated, its obfuscated...

aidenn0 · on Jan 7, 2014

I have but a passing interest in cryptography, but without looking anything up:

AES-GCM That's AES (a block cypher) in Galois C? Mode (I think the C is counter, but in any event, I do recall that there are lots of ways to use block cyphers, and GCM is one of them; if the C is for counter, then I'm guessing it is a counter-mode and gaolois refers to how it achieves authentication, since in general counter-mods of block cyphers are non-authenticated)

nonce: Any value that should only be used once. Generally speaking if you use the same key and nonce twice, the security of your cryptosystem is in some way compromised. Using it as a counter would ensure that it is used only once, so long as a different key is used for each session. Some cryptographic primitives take relatively small nonces, which makes using a random nonce a Bad Idea due to

Birthday bound: If you take a large number of samples from a uniform random distribution, it takes a surprisingly small number of samples before you get the same value twice.

"...due to truncation in the standard" Since I don't know anything about the specifics of TLS, I'm just going to make some shitty wild-ass guesses here: I'm guessing that the truncation it refers to is truncation of the nonce (which would give you fewer nonces to work with) or in the cryptostream itself (which would require you to use more nonces) either of which hurts you when using random nonces. I'm going to go out on a limb and say that it's not referring to truncation of the standard itself, since that's stupid and there are lots of ways that the standard could require some form of truncation.

So that's me, who knows less about crypto then anybody on the mailing list; I can make some sense out of that jargon. Anybody who is actually professionally involved in cryptography likely has no problem understanding that.

Using jargon when talking to other people in your field is a necessity for not going crazy. My first internship was with a telecom company. I was given a specification that included a half-dozen acronyms I had never heard of before, along with a few terms that clearly had a specific meaning in the field. (I knew that ATM wasn't referring to bank machines, for example). But really, if someone had to explain what ATM was each time they used it, nothing would get done.