Hacker News new | past | comments | ask | show | jobs | submit login
You are dangerously bad at cryptography (happybearsoftware.com)
268 points by alinajaf on May 27, 2013 | hide | past | favorite | 167 comments



His insight that crypto is hard because you don't get feedback when you mess up is good. It made me wonder what other domains are like that -- domains where correctness seems within reach, yet there are subtle aspects that are hard to state. Some that come to mind:

* Concurrency. It's easy to introduce race conditions, livelocks, or deadlocks without even knowing. They are often difficult to observe or reproduce, and it takes a lot of hard-won experience to become sufficiently critical of your own incompetence. On the other hand, the tools are improving here (helgrind, relacy, threadsanitizer).

* Exception-safety. I don't mean in the weak sense of "won't leak resources" but in the stronger sense of leaving the system in a consistent state after an exception. Again, it's difficult to analyze or induce all the different code paths that might occur, and difficult or expensive (depending on your starting point) to roll back your mutations. Again, there are tools that help with this, like STM or cheap persistent data structures (so you can do a bunch of mutation that is only committed atomically at the end).

Are there others? Can the progress in these domains shed light on how to make crypto safer?


Outside of computing: health and diet. Sure, you get some feedback, but you also can screw up things in some ridiculously slow-acting way. Pharma companies (are supposed to) monitor drug use for decades after introduction for the same reason.

Construction. Just build that chunk out of two steel beams instead of one, and witness it work perfectly until someone decides to throw a party on that balcony. Or witness your construction slowly turn itself from a perfectly normal tower to a major tourist attraction over the centuries.

Parenting. Things like sudden stimuli in early life have been linked to ADD, the way you do attachment can have tremendous consequences, and sometimes singular events you barely have any control over can have unpredictable consequences later in your life.

In fact, I'm tempted to say "life in general".


Economics is also an excellent example. Nobody knows how to do it properly, even establishing direct cause and effect is basically impossible so we just iterate through progressively less broken solutions.


"progressively less broken solutions"

Now that's a controversial statement if I've ever seen one.


> so we just iterate through progressively less broken solutions.

And then get all ideological about it and iterate back to more broken solutions (Chicago school)


And then get all ideological about it and iterate back to more broken solutions (Chicago school)

Or, even worse, Keynesianism.


I wouldn't necessarily call Chicago school broken in it's entirety.

I guess the whole idea that there are differing "schools" is probably an indicator in itself though.


> health and diet. Sure, you get some feedback, but you also can screw up things in some ridiculously slow-acting way.

This makes me think of the primal diet and coconut oil (saturated fat). The current arguments are that good saturated fats don't cause heart disease but (roughly) sugar does.


Along the lines of of your examples, I would add distributed systems.

Still, cryptography is much, much worse than all these examples for a simple reason: with a sufficient amount of testing for functionality you can convince yourself that you've gotten things like concurrency correct, or at least correct enough that it won't be a problem most of the time. You can't do that with security - instead of testing for functionality you have to test for attack resilience, but whereas you know what functionality to test for, you can't test for attacks you don't know about. Furthermore, you can't be correct merely most of the time - you have to be correct all of the time, because if you slip up just once, your adversary will exploit you.

I really don't think there's any parallel.


Rumsfeld sums up the problem rather succinctly: http://www.brainyquote.com/quotes/quotes/d/donaldrums148142....

I think Netflix and its Chaos Monkey software is a very effective way of dealing with the 'unknown unknowns' of distributed systems.

Although obviously this approach doesnt work with cryptography systems. They call it Chaos Monkey (opposed to Genius Monkey) for a reason.


The biggest candidate:

* Pure mathematics: Consider e.g. the difficulty in verifying the recent proofs of Fermat's Last Theorem (Wiles-Taylor-Frey theorem?), the Poincaré conjecture (Perelman-Hamilton-Thurston theorem?), and now the ABC conjecture (Mochizuki-Szpiro theorem?): there is essentially no indication of the correctness of a mathematical proof besides simply having a whole lot of smart people look at it and think very hard. This may be of particular interest because pure math has lately experienced a turn towards the use of proof-verifying systems, and so it may be of some interest if one of these can be designed for cryptography.

In particular, consider the following method of verification: a cryptographically secure function h(M, k), known or at least believed to be hard, a message M, a secret key k, and a channel C(h) over which data is transmitted. We wish to prove the following combination of statements: a: given C and h, a method to obtain the message M also obtains the key k, and b: given M, C, and h, it is "impossible" to obtain the key k -- i.e. it would require breaking the hash function; this means that if the cryptography is broken, so must be the hash function. The latter portion of the proof is necessary because we must always consider the possibility of a rubber hose or a stupid user.

EDIT: I should add here that side channel attacks depend on an incorrect understanding of the content of C. So perhaps we should include c: we send only what we wish to in C.

In the interest of pedantry, I dreamed up an example "cryptosystem" which may fit the bill, though I have no honest idea of the difficulty of the h function [and I know almost nothing about cryptography!]: consider a large prime number p, the finite field F_p, and its algebraic closure Fbar_p. A message M is a [presumaly long] polynomial M(x) over Fbar_p, and a key consists of the pair k = [p, z], where z is some arbitrary element of Fbar_p. h(M, k) is obtained by expanding (x - z) * M(x) in Fbar_p, and we send over C the coefficients of the resulting polynomial. Verification is obtained by computing C(z) in Fbar_p and the message is extracted by polynomial division. Then (a) rests on the difficulty of factoring a polynomial and (b) rests on the secrecy of p: so we cannot, for example, simply compute C/M.

Note that Fbar_p is countable, so the whole procedure uses only integer math.


If I understood your proposed system correctly, the following seems to be a deal-breaking weakness: Given a number of polynomials representing encrypted messages, (x-z) is a common factor of all of them. The GCD of a set of polynomials can very efficiently be computed. Thus, an attacker observing different messages encrypted with the same key over time gets a better and better idea of what the key is (how quickly depends on the messages: two plaintext messages that happen to give coprime polynomials M_1(x) and M_2(x) would be enough to get the key).


In this case, simply computing the GCD may require "non-obvious" intuition, since the polynomials do not act over the real numbers, but the algebraic closure of an unknown finite field. But I wouldn't bet my data on it, at least, not until I am a lot more confident in my understanding of abstract algebra.


Ah yes, I missed that the prime is part of the key.


Two notes: factoring polynomials over finite fields isn't hard (look up the Berlekamp algorithm or Cantor-Zassenhaus for starters). Second, I believe the system you've described doesn't satisfy your property (b). Quantities such as M[1]C[0] - C[1]M[0] + M[0]^2 and C[1]M[2] - C[2]M[1] + M[1]^2 - M[0]M[2] are congruent to 0 mod p; compute them over the integers and then take the gcd. By computing enough of these you should recover the prime p. Once you have p, recovering z is very easy (e.g., take the gcd of C and C'-M, where C' is the derivative of C).


I feel a major problem with mathematical proofs is that the language of these proofs is so complex that in order to understand what it even says you have to be extremely smart, and willing to put in a huge amount of time in the first place.

A proof is in effect a "program" that describes the logical conditions of whatever it is trying to show. Invalid proofs are those with "bugs". Perhaps if the language of math was a bit clearer, it would not be such a difficult to understand field.


- Relying on undefined or implementation-specified behavior. Problems only show up years down the road. (example issue: expecting signed overflow to wraparound in C)

- Avoiding statistical biases when transforming random values. Statistical unit tests are hard. (example issue: shuffling via lots of uniformly random swaps)

- Integer overflow ruining algorithms that would be correct, given unbounded integers. (example issue: average = (a+b)/2)


Regarding statistical biases, you can always pipe expected distributions through the test and compare to the expected transformed distribution using chi-squared.


Memory safety in memory-unsafe languages. We've had Valgrind and ASan for a while, and people still find crippling bugs in C and C++ code all the time.

XSS vulnerabilities. Maybe Content Security Policy will help some here when it becomes ubiquitously available.

Integer overflow. This is a particularly insidious problem because your well-formed test cases often won't catch it.


Here's a gem from an attempt to fix an integer overflow vulnerability in the PHP compiler:

if (size > INT_MAX) return NULL;

http://use.perl.org/use.perl.org/_Aristotle/journal/33448.ht...


"Thinking Fast and Slow" tackles this issue.

Doctors who asses xray fotos for cancer usually get feedback ~6 months later. What's worse is they become more confident in their abilities as they age, and subsequently diverge from the standard checklists inexperienced doctors follow. This leads to even worse results from doctors with 10-15 years of experience.

The opposite case is surgeons who gain immediate feedback and generally become better with experience.


Note: we're still running these challenges:

http://www.matasano.com/articles/crypto-challenges/

The current standings are:

* level 0 (4362 players),

* level 1 (335 players),

* level 2 (123 players),

* level 3 (40 players),

* level 4 (21 players),

* level 5 (23 players),

* level 6 (32 players)

We're still donating $20 to PIH or Watsi for everyone who finishes all 6 sets.

The top languages finishers are using are (in order) Python, Go, Ruby, C/C++, C#, Clojure, and Haskell. People are also using Java, PHP, OCaml, Scala, Racket, Javascript, Common Lisp, Perl, Julia(!), erlang, Rust, Visual Basic, Groovy, ObjC, F#, Factor, Dart, and Microsoft Excel(!!!).


Just curious how far the Excel hacker(s) got... Also, have you made any hires from the contest yet?


All I'm going to say here is that you would not freaking believe how far the Excel guy has gotten. No VBA, either; it's pure spreadsheet cells.

Yes, we've hired from the challenges (I don't like thinking of it as a contest; I think of it more as an extremely engaging blog post).


The Excel thing started out as someone else's funny tweet and turned into me riding the bomb down while waving my cowboy hat, Slim Pickens style.

This tweet started the ball rolling:

https://twitter.com/sachinag/status/329701402546941953

The spreadsheets use no VBA or AppleScript or external code -- just Excel formulas and the functions Excel 2008 offers. I submitted the first four spreadsheets along with my "real" (Python) code/answers for set #2, just for the shock/horror value.

Later, despite what I said about giving up on Excel, I implemented the Mersenne Twister PRNG and single-block AES encryption (all three key sizes!):

https://twitter.com/areuugee/status/333344208154923008 https://twitter.com/areuugee/status/334142574296199168

One of these days, I'm going to start a blog just so I can write a series of posts about Excel. There are definitely limitations in what the design allows you to do, but it's more powerful than anyone gives it credit for. It'll also be cathartic to share how I made the AES and Mersenne Twister spreadsheets. (Spoiler: The C preprocessor was involved.)

That said, I don't think the Matasano people have to worry about me solving more problems in Excel. Programs that I could bang out in five minutes in Python turned into all-day, profanity-laden affairs with Excel, which turns Excel into a distraction from the crypto stuff.

Also, if anyone decides to give Matasano's crypto challenge a try, it's a great opportunity to learn a new programming language or a great excuse to get better at a programming language that you don't know so well. My Python knowledge was pretty minimal at best (the only languages I've ever really cared about are C, Fortran, Perl, and [spit!] PHP), but now I'm much more comfortable with it. In fact, if I ever finish, I might go back and redo everything in Ruby just to learn Ruby...


Honestly, this is awesome. As someone who once wrote MT-19937 in Windows batch files (and started a bignum lib in those too) I love seeing tools applied to problems that seem totally wrong for it.


Is that the number of people working on that level, or the number of people who've finished that level? My guess was the former, but if so, how many people have finished level 6? And if it's the latter, how many people have started level 0 but not finished?

I'm also curious what the stats look like for how long people are taking to complete each level.


That's the number of people who received problems on that level. I would guess most of them are not working on these problems (especially Level 0).

For example I got Level 0 set of problems, enjoyed the reading, but decided not to work on these problems, because I'm just not enthusiastic enough about diving deep into cryptography.

I guess most of 4362 "players" on Level 0 are in a similar position.


there are 6 levels. so presumably level 0 is people who started (ie requested) but did not complete level 1.

a level is a day or two of work, in my experience (unless you get stuck and need to sleep on something, or get entertained and go off exploring something), but i doubt most people are doing them full-time, so timing data is going to reflect mostly how busy they were with other things.


Yeah, I've been at level 1 for quite a while, but that's because I haven't had much time available to spend working on it. When I get some time I try to work through one of the problems. Of course, I'm also doing it in Rust, so add in the extra time of learning a new language (especially one still in flux).


They're not going anywhere. Take your time!


Hmm, nobody has tried using Brainfuck yet...


When you undo the C and C++ merger, how does the list look? They are very different languages, in particularly in that they attract very different people, so it is interesting to know which is actually being used rather than "C/C++".


If I had to guess, Java and/or PHP would be in the top 10 if I disaggregated. There are more C++ players than C players.

These are based on survey results, and the selection I presented was "C/C++", so I'm stuck guessing.


So, sure. I'll admit that I'm bad at cryptography. You win!

But the problem is, the world needs more, not less, crypto. We need to integrate crypto into more places, not run and hide and declare it too hard every time we come across it.

Is it easy to screw up? Sure. So is manually allocating memory. But we use higher level languages to help protect us from ourselves.

So too can we use higher-level libraries to help protect us from some mistakes we can make with crypto.

I'm working on an app which uses crypto heavily- In initial versions, I used manual RSA keys, but I recognize that no matter how smart I think I am (not very), it's easy to mess stuff up.

I can screw up the padding, I can screw up the implementation. Other clients which interop with me can screw up...

So Instead, I've swapped to using GPG keys. Is it perfect? Heck no! But it reduces the code that I have to get right. It outsources the parts that are most likely to be screwed up to a place where they're more likely to be reviewed, more likely to be written by non-fuckups.

There are lots of higher level libs, gpg is just one of them. keyczar, etc are also ways to solve this problem.

But we shouldn't run in fear everytime someone mentions the word Crypto, just because we have a chance of messing up.


What we really need is a cryptography language, to do for cryptography what SQL does for databases. I should be able to say, "Messages have this form, and they need to be encrypted end-to-end with authentication and without surreptitious forwarding," and have the right components assembled for me. Unfortunately, our understanding of cryptography is not well-developed enough to create such a system, and so we still fumble around and usually produce systems that are vulnerable (did you remember to have a good source of entropy? did you make sure your messages are all the same length? does your system compose well with others?).


http://nacl.cr.yp.to/

NaCl is well implemented cryptographic functions designed to be easy to use and fast. As opposed to something like OpenSSL that gives you nine million options, NaCl just does what is best.


NaCl and Keyczar are both good options. We also tend to recommend that people simply use PGP for data at rest, and TLS for data in motion. Neither are perfect, but both are subjected to intense scrutiny by researchers.


> We also tend to recommend that people simply use PGP for data at rest, and TLS for data in motion

When I complain about bad crypto (in API auth in particular) and my clients really really push for me to give them advice, I repeat this line verbatim.

They hate it because TLS with client-side certs for authentication (where you become the CA) is unfamiliar and has too many moving parts for them. They go and develop their hand-rolled API auth, I proceed to shoot holes in it and come across as a bit of dick (to be fair, I'm not hired as a security consultant, just a regular developer).

I can normally get a few developers on board, but have yet to convince a client to use TLS in this way in production.


I do not think the problem is with how many options we are presented with, but rather with the difficulty of figuring out what primitives are actually needed to solve a particular problem, and how to compose those primitives. Do you need digital signatures? Do you need a hash function? Do you need to establish a common random string before the system can be used? It is very easy to assemble a system that appears secure (it's encrypted and signed!) but that does not actually provide any meaningful security (oh no, we actually needed non-malleable commitments!).

A high-level language could help quite a bit, because it would help programmers abstractly specify the needs of the system rather than getting lost in the details of which operations to choose. Maybe you really only need to sign and encrypt your messages. Maybe you need to sign, encrypt, then sign again. Maybe you do not need signatures at all, but you need to use a non-malleable cipher and a few rounds of communication (e.g. to make a deniable authentication protocol). The next generation of security problems will not be solved by slapping on encryption and digital signatures; we are going to need to pay increasing attention to higher-level issues.


Even if we had such a language it would probably not be high level enough because understanding what security you know is still a hard problem. Instead I'd suggest a high level library that gives you a complete security package with defaults that can't be easily changed.

Something like what the CAESAR competition hopes to achieve: http://competitions.cr.yp.to/caesar.html


Cryptol is a DSL that simplifies the specification of a cryptographic algorithm http://corp.galois.com/cryptol/


The world needs more broken crypto like a dissident strapped to a chair in a concrete cell in South America needs another car battery alligator clipped to their fingers.

How about, if the world really needs more cryptography, the people who bring it to us take the time to become just a little bit literate in how crypto is actually attacked, instead of pretending like they understand it just because they were able to produce intelligible outputs from OpenSSL's AES?

You're using raw RSA. How much practicing have you done of attacks on RSA? How literate are you in RSA? If the answer is "not at all", why are you allowing yourself to use RSA? How is that not negligent?

(I don't know you, or didn't read your name closely enough to detect that I do, so maybe you have done this legwork; in which case, share with us why you think everyone else shouldn't do the same work?)


There's something that strike me as a bit off in what you write. I'm having trouble pinning it down, so here are some vague thoughts:

* The world does need more crypto. There's market demand for keeping stuff safe/hidden/whatever.

* It is hard to get crypto right. People like the author, and if I'm not mistaken, yourself, keep pounding that point home. Ok, we're convinced... but people still need to do this stuff, and not all of us have the money to hire you.

* "just a little bit literate ", given the above, seems kind of dangerous, no? It seems that way to me. Why bother learning just enough to get yourself into trouble?

* Given the above demand, people are going to try this stuff one way or the other. It seems the best thing is to give them the safest building blocks. The cited example in the article seems indicative: the companies were all trying to do more or less the same thing, which was not something complicated. Why wasn't it easier for them to do the right thing?

* I think what the world needs is easier, clearer, proven open source solutions/recipes to common problems.

Also, another thought that is not related to you or what you wrote, or write: is it just me or do a lot of security discussions turn into dick waving contests? Why is that?


The world might need more working crypto. The world doesn't need more broken crypto.

Broken crypto isn't just a step on the path to working crypto; it's an opportunity for people to get hurt.

The bet I'm making right now is that if people get a little bit of crypto literacy, they'll stop being so excited about deploying crypto in their applications. Implementing a bunch of crypto attacks has the effect of making you paranoid about cryptography. If generalist developers have one key problem with cryptography, it's that they're not paranoid about it --- in fact, the opposite: when they write crypto features, the crypto makes them feel safer. That's not how the crypto professionals I know feel about cryptography!

I strongly agree: things like NaCl and Keyczar are a great solution to this problem. Take the knobs away from the developers and just give them something that is likely to work, designed conservatively. Unfortunately, NaCl and Keyczar have nothing resembling the popularity of "I found this RSA implementation in Ruby and now I'm going to build an application with it". How do we fix that? I think part of the solution has to be to convince developers they should be more afraid of DIY crypto.

As for security: you should understand that when we write about it, we're writing about a competition. Attackers vs. defenders. Writing about competitions (or, in some of our cases, actively participating in those competitions) does something to the tone of your writing.

The software security field can be annoyingly competitive and status-oriented, too.


After having this debate with you two or three times now, I'm starting to realize that we both want the same thing: Good libraries like Keyczar that just do the right thing by default.

I would argue that there is a second side to the solution: Authors of more low-level crypto libraries (like OpenSSL) should very prominently warn users that said libraries are easy to misuse, and they should point users in the direction of the high-level libraries.

In my travels around the web, I've not often encountered such a warning. For example, as of today, the top Google hit for "ruby encrypt string" is a StackOverflow post. Its highest-voted answer advocates an OpenSSL wrapper.


Absolutely. Strong agree!


How would this 'warning' look?

I think labeling 'expert' is almost like an attractant for many of the folks that shouldn't bother. Likewise, there are some good users of OpenSSL, the rumors of it being "bad" or "insecure" would be damaging.

I'm not saying it's a bad idea exactly, just if you discover the way to word the warning to prevent people who don't understand that they're newbies from doing newbie stuff with it, you'll be on to something. I say you put that label on C compilers too.


Good point. I'd word it something like this:

A Word of Warning

This library exposes a very complex API. It is intended for expert users only. If you have any doubts about your knowledge of the underlying cryptographic primitives, we strongly recommend against using this library. Doing so without advanced knowledge of cryptography could compromise your security. Instead, we recommend you use a high-level crypto library, such as Keyczar or NaCl, both of which are designed to "just work" in the hands of developers who lack specialize expertise in crypto.


Honestly, one of the big problems is that people confuse crypto primitives with crypto schemes. Developers need schemes, not primitives. Some people, who know what they're doing, need the primitives, but they are by far and away the exception.

AES is a crypto primitive, AES-CTR-CBCMAC (aka, AES-CCM, but spelled out to emphasize the complexity of it) is a scheme. And even then you have key distribution problems, which is essentially clipping on another two or three car batteries.


> AES is a crypto primitive, AES-CTR-CBCMAC (aka, AES-CCM, but spelled out to emphasize the complexity of it) is a scheme.

In this particular example, the 'schemes' you point out didn't fair that well in pretty much every single codebase I've seen them implemented in after I finished set two of the crypto challenges.


Besides the usual IV hygiene, what's wrong with CCM? Or are you singling out the separate primitives AES-CTR and CBC-MAC?


I'm not using RSA - As I mentioned, I migrated to GPG specifically because I can't promise I know what I'm doing. I encourage others to do similarly (calling out Keyczar), rather than using primitives.

Re-reading, I can see how I could have been more clear and to the point, I did meander a bit in my original post.


If you have a crypto heavy app I recommend the fast and easy to use correctly NaCl project.

http://nacl.cace-project.eu


What the world needs is more managers that hire engineers instead of lawyers when someone points out to them that their crypto is broken.


I would break the levels of crypto knowhow down thusly:

1) End user level: You want to buy things online, you should know to check for the padlock icon.

2) Deployment level: You run an online shop using an off the shelf eCommerce system. You should buy an SSL cert and know how to install it on your web server of choice. You should also know the difference between a public and private key (and hence why you should never give the private key out) and roughly what a certificate and digital signature are for.

3) Integration level: You are building a custom application/API which uses crypto for security/privacy. You should know the difference between symetric/asymmetric encryption, what things like HMACs are and what the purpose of a password hash is.

You should know how to choose a good library or ensure that your framework is integrating a good library. You should also be mindful of which information you should never leak over cleartext channels and also things like MITM attacks.

4) Implementation level: You are using existing crypto primatives to develop a new use for encryption, maybe a crypto currency or a new type of auth protocol for example.

You should know a lot of deep computer science related to crypto, like the difference between AES CBC and ECB modes, various issues related to padding, how to generate secure random numbers and countless other issues. Here be dragons, you probably need significant peer review before your work is "production ready".

5) Design level: You are trying to improve an existing crypto algorithm or develop an entirely new one. Here you are going to need a very deep understanding of mathematics and cryptography research, probably to PHD level. So all kinds of stuff about number theory and being able to formally prove everything.

I would guess that most developers should stay at around level 2 as much as possible, delving into level 3 only when required and never further.


Funny story - quite a few years back I was a sysadmin for a company that was compiling their own PHP with a couple of patches. When I asked about it, my co-worker said it was to suppress some warnings that were spamming the logs whenever someone created an account.

Yes, those warnings were telling us that we weren't using an initialization vector (IV, aka a salt) when hashing the passwords. Facepalm. I suppose that demonstrates the dangers of developers thinking they should be diving into level 3), and also the benefit of (my) being aware of more of the theory than we should have been implementing ourselves.

Yes, I filed a bug to start using hashes, re-hash all our users' passwords, and get rid of the patch. This was also well before those massive GPU cracking systems, so everything turned out fine.


Once you are choosing salts that is almost level 4 knowledge. This is one of the problems with "raw" PHP development, there's no way to say "this is a password field, make it secure" only "hash this with algorithm X".

To be fair though , I believe recent releases have addressed this and frameworks like symphony give you a default user class to inherit from that does things sensibly. It also sticks a seed for the salt directly in the config file (where devs will see it) and says "make sure you set this randomly and keep it secret".


Hmm, unfortunately I think Symfony2 runs passwords through 1 round of SHA1 unless a custom password "encoder" is used [1].

[1] = http://symfony.com/doc/2.0/book/security.html#encoding-the-u...


Bcrypt and PBKDF2 are not custom, they are part of the framework:

http://symfony.com/doc/current/reference/configuration/secur...


Those were added in Symfony2.2; my last experience was with 2.1. Also, I understand that neither of those is default; one should be.


> This is one of the problems with "raw" PHP development

... or any other language.


> I would guess that most developers should stay at around level 2 as much as possible, delving into level 3 only when required and never further.

Why would you advocate that people stay away from learning cryptography? They should know as much as possible, they just should avoid implementing it.


Sure, I meant in terms of production implementation though that may not have been clear.

It's probably wise to know basic things at least one level further down than your required level so you know when things are starting to smell funny.


This argument keeps coming up, and while its premises are valid, its conclusion never sits right with me, namely: "Don't use cryptography." That advice isn't practical for developers. There are plenty of systems we have to design where crypto is not optional. Examples:

* Storing passwords. You can't store them in plaintext.

* Signing requests (like in the OP's example). What are the alternatives? You can store some kind of authentication in the cookie, which then gets sent with each request. You can use HTTP auth. But in any scenario, you're definitely using SSL, and you're probably using some other kind of crypto on top of that.

* Encrypting cookies. You should store as little as possible in cookies--ideally just a single identifier. But even so, it's very important to make cookies tamper-resistant. You can't, e.g. just have a plaintext cookie that says "userid=42."

So what does the OP propose we do? If this article were making the usual case that we shouldn't implement our own crypto algorithms, I'd agree 100%. But he seems to be taking it a step farther, and saying we shouldn't even use existing crypto libraries, because we'll misuse them. But what are the alternatives?


Point by point:

* Despite all the press password hashing gets, bad password hashes are rarely the worst mistakes people make in their applications. I'm glad there's a meme now about using bcrypt or scrypt, but I don't think people should be afraid to deploy password hashing.

* The alternative to signed requests is to credential the requests directly and not build delegation features. Many applications that use signed requests to solve authorization problems are in fact overbuilding; they're in effect building the infrastructure for federated authorization to solve simple point-to-point authentication problems. Just use HTTP Auth and a long random credential.

* You said it right here: just use random tokens to key a serverside session store.

So, I'd suggest those are four bad examples.

But backing your point out a little: so what if you feel like you need to use crypto in your application? Cryptography doesn't get easier just because you want/need it to. I find myself tearing my hair out at the circumvention/liberation tech people, all of whom respond to the same argument with, "so what if the engineering is hard, people need this stuff*. Well, they need it to work, too, don't they?


> I don't think people should be afraid to deploy password hashing.

Neither do I, but the whole "don't use crypto" meme says the opposite.

> Just use HTTP Auth and a long random credential

Yes, that is a viable alternative. But it is also one which uses crypto. You're generating a random sequence, and you're using SSL. So once again you run up against the "don't use crypto" argument.

> You said it right here: just use random tokens to key a serverside session store.

But again, you've already introduced secure random number generators and SSL.

The point I was making with all these examples is that they are commonly encountered and can't be solved without crypto. Some of the solutions are easier to screw up than the others, and that's definitely worth talking about. But "don't use crypto" is too simplistic for these kinds of use cases.


Just accept the fact that people who say "don't use crypto" aren't saying "don't hash passwords", nor are they saying "don't generate random numbers", and move on.


Sure, but then what are they saying? "Dont use cryptography" is a quotation taken verbatim from the OP, and I've heard similar statements all around in the last few years. A reasonable person reading that statement would interpret it at face value: "Don't use cryptography" means that very thing.

So what I'm suggesting is that the "don't use crypto" meme should go away and be replaced with something more helpful and more specific. For example, the responses you've given on this thread have been both helpful and specific. I'm arguing that people should say the kind of stuff you've said, rather than the unrealistic "don't use crypto."


Isn't the actual meme "Don't build your own crypto", rather than "don't use crypto"?

I remember at least 10 years ago getting (and seeing a lot) the advice "use SSL for data in transit, use GPG for data at rest". Those two principles, combined with the somewhat more recent "just use (b|s)crypt" for password hashes would still provide your average non-crypto-expert developer with a pretty good fundamental starting place.


I've seen a fair amount of both.

The former ("don't build your own crypto") is clearly correct.

The latter ("don't use crypto") is what I disagree with. One example of this meme comes from the OP, who ends his article with: "Save yourself the trouble. Don't use cryptography. It is plutonium."


It really seems like this is an argument that seeks to make it harder to understand a problem, rather than easier. I'm just not interested in the semantic debate, sorry.


You made it a semantic debate.

  "Don't use crypto." - you  
  "That's terrible advice, we need crypto for x,y,z"  
  "I don't really mean don't use crypto"  
  "You just said that!"  
  "You're just talking semantics."


Nobody dings developers for generating random numbers because they're "doing crypto".


Few do, but I genuinely thought that the OP was, so no semantic debate was intended :) (And I have actually seen people say that elsewhere.) While the OP didn't mention random numbers specifically, his prohibition on using crypto seemed general enough that I assumed it to include random numbers. Ditto for hashing. Certainly, his method of argument--describe the technique, show a vulnerability the reader might not have heard of--could be applied to both password hashing and random numbers.

I realize, however, that you have a more charitable interpretation of what the OP was suggesting. I think you and I mostly agree about the substantive issue: Crypto should be used, but as you said, developers owe it to users to make it work. My disagreement isn't with you but with the OP and others who seem to suggest tossing aside crypto because it's too hard to get right.

I just realized I haven't proposed a positive alternative to the "don't use crypto" meme, though. I honestly don't know what the answer is, I'm afraid. Realistically, lots of devs need to use crypto, and we can't all develop your level of expertise in that area. (The founder of a security consultancy will always know a lot more about security than generalist app developers. Only so much time in the day.) So becoming a true crypto/security expert can't be the solution, even if that would be the best one. The best realistic solution I can think of is for authors of crypto libraries to provide enough documentation that devs can use it safely.


> * Storing passwords. You can't store them in plaintext.

Why not. Just write upfront on the signup form, "we don't hash or protect your passwords in any way. Do not reuse passwords from other sites. Create a unique password and retrieve it using a password manager."

Password hashing is a losing battle:

* the users who aren't educated enough to use unique passwords are the same ones who will always use a really weak one anyway, and those will always be crackable

* salting is nearly pointless. the GPU killed the rainbow table, the "bad passwords" keyspace is small enough it's easy enough to recompute, and in most hacks I've seen, the database has been compromised together with the source code so the salt is never really secret.

* the first advice after a hack is still always going to be "change your passwords".


> Just write upfront on the signup form, "we don't hash or protect your passwords in any way. Do not reuse passwords from other sites. Create a unique password and retrieve it using a password manager."

For one, I'd consider that bad business. If you want to make money, it's not a good idea to declare to users that your system is insecure.

More importantly, you're now putting all the responsibility on the user. Yes, it's good for users to think about security. Yes, it's impossible to 100% guarantee the security of your users' passwords. But disclaiming all responsibility? We're the ones with more technical knowledge, not our users. We should bear as much of the burden of security as we can.

> Password hashing is a losing battle

Your argument here seems to boil down to the idea that GPUs are now capable of cracking any password hashing scheme we have, assuming weak passwords. My understanding was that this was not the case, but perhaps I'm wrong. As far as I know, you can set the difficulty factor in Bcrypt high enough that it's impractical to crack on any commodity hardware. There's also scrypt, which is supposedly even stronger in this respect, although I don't know if it's been adequately vetted yet.


> Just write upfront on the signup form, "we don't hash or protect your passwords in any way"

Can you legally absolve yourself from all responsibility in that way? Say Toyota sell cars that explode when the engine is reved beyond the red line. If the sale contract says, "Toyota is not responsible if you do not follow the instruction manual" does that mean they cannot be sued?


> Encrypting cookies. You should store as little as possible in cookies--ideally just a single identifier. But even so, it's very important to make cookies tamper-resistant. You can't, e.g. just have a plaintext cookie that says "userid=42."

Note that you possibly don't want to _encrypt_ the cookie (particularly if it's transmitted over SSL); you certainly want to _authenticate_ it.

I may not care if the user knows that he is user ID 42; I very much care if he's able to tell me that he is user ID 43 instead.


There aren't any, learn how the libraries work and use them. I agree with your point, in that respect.

Also, about the cookies, you can definitely have a plaintext cookie saying userid=42, as long as it's signed.


Yes, that's what I'm driving at: You have no choice but to learn the crypto libraries and use them. I've argued before that authors of crypto libraries have a sort of professional duty to document their libraries well, including info on all the mistakes developers are likely to make. I think if you're going to claim to offer crypto for average developers, you owe it to them to document it properly. It's always seemed a little unreasonable to me to blame non-cryptographers for being unaware of obscure vulnerabilities. We can't all be experts in everything.

Yeah, agreed about the cookie. I guess I should have said "a plaintext cookie only containing 'userid=42.'" Because clearly you can't just let people edit their cookies and take on any user ID, session ID, or other such identifier.


> Also, about the cookies, you can definitely have a plaintext cookie saying userid=42, as long as it's signed.

Not necessarily good enough. If you just sign a cookie that says userid=42, anyone who ever manages to read it can now authenticate as your user whenever they like. You just turned any session hijacking attack into an account hijacking attack.

Do what tptacek says, "just use random tokens to key a serverside session store" and don't write your own crypto.


Wouldn't the random tokens also be vulnerable to a replay? If I copy a session cookie from one client to another, regardless of how that cookie authenticates itself, won't most servers accept it? Seems to me that once the client has been compromised, session hijacking is going to happen.

The server can be set to check that the client IP address and HTTP fingerprint match. That makes the replay attack harder but not impossible. It also introduces some usability concerns--valid sessions may expire sooner than intended.


The token is just a key to a server-side session that will be deleted eventually, for example when the client logs out. If someone can hijack a session, that's obviously already a problem, but unfortunately XSS attacks are fairly common as they are something that can happen in any number of places on a website, and even vigilant developers will probably miss some. (See, for example, the recent XSS attacks reported on Paypal).

If you use a random token to a session that will be invalidated, then the attacker is at least time-limited. If you just naively sign an auth cookie and trust it later, he has a replay attack that will work permanently until you change your code.

As for you other suggestion, restricting sessions by IP address can be done no matter where the session data is stored. It will help to mitigate session hijacking in general, but it also degrades the experience, especially for mobile users, who will have to re-login every time they change IP addresses. It's particularly a problem if a form submission fails for this reason, because they'll probably get redirected away to a login page and lose any information they entered.


Yeah, both this and the original article are vulnerable to replay attacks. A token is much better, but there are times when you don't want to hit the DB, such as for non-critical data like displaying the user's name on the page. Signing a cookie is reasonable then.


I'm bad at crypto. I'm well aware that I'm bad at crypto. In fact, I would venture to guess that I know next to nothing about it. The problem isn't that I don't know what I don't know...the problem is that I don't know what (or whom) I can trust. And that is the bigger problem with crypto.

It would be nice if we had resources that could tell us not only the best practices, but what their downfalls are and their relative difficulties of cracking. Some common sense tells me that there is a combination of techniques that are simple and robust enough that I would have to worry about phishing before I had to worry about better security. But I don't know what that combination is, and I don't know if I can trust anybody that tells me.


Try Dan Boneh's crypto course on coursera [1]. It covers quite a lot of ground, both practical and theoretical, and includes programming exercises similar to the matasano puzzles. Without a doubt it's one of the best courses of the dozen MOOCs I've taken. There's also a followup course [2] (I haven't taken it yet personally, but I believe the currently scheduled run will be the first).

Interestingly enough, there's also an upcoming security course (with no date planned yet) which will cover the application programming part of security and will be co-taught by him. [3]

[1] https://www.coursera.org/course/crypto [2] https://www.coursera.org/course/crypto2 [3] https://www.coursera.org/course/security


Glad that you like Dan Boneh's crypto class. I made the programming exercises :-).


They were very well done, and I loved the course overall, so congrats. I like how Dan knows how to preemptively answer every question I had.


Author here. A good place to start (mentioned in the article).

http://www.matasano.com/articles/crypto-challenges/

Never met any of them in real life, but they seem like good people to me. I've actually used knowledge I picked up from those exercises to fix vulnerabilities in past clients code, and I'm not yet half-way through them.


> Measure the time each request takes to complete. Since string equality takes a tiny bit longer to complete when the first char matches, the message that takes the longest to return will have the correct first character.

Always wondered if this really works in practice ...

I imagined the time it takes to compare 2 strings should be negligible / indistinguishable in a full HTTP request over the wire. Among all the other things happening in the network stack, OS, Rails, HTTP, database access, and everything else you're doing in your request, a string comparison of a few dozen characters should be well within the noise range.

I've Googled around and it seems like it really is a viable attack:

http://www.cs.rice.edu/~dwallach/pub/crosby-timing2009.pdf

> We have shown that, even though the Internet induces significant timing jitter, we can reliably distinguish remote timing differences as low as 20µs. A LAN environment has lower timing jitter, allowing us to reliably distinguish remote timing differences as small as 100ns (possibly even smaller). These precise timing differences can be distinguished with only hundreds or possibly thousands of measurements.

#mindblown


Roughly a hundred people have implemented a milliseconds-granular timing attack in our crypto challenges. Our challenge isn't totally realistic (in reality, you'd be timestamping as close to the wire as possible, and if you used Python, you'd be using it to postprocess samples you took in C) but I think if you get through it you'll understand why the attack is viable.

Keep in mind that most target applications will allow you to colocate yourself for ~$100; you just get your sampling server onto a VM in the same hosting center.

There are programming environments that don't readily admit to over-the-wire timing attacks. Nate Lawson did a Black Hat talk about this a few years ago. It is difficult- bordering- on- implausible to remotely time memcmp. It's hard to say whether it's going to get harder to carry out this attack as machines get faster, or easier as attackers discover more filtering techniques and better ways of pessimizing execution on target machines.


Even better - with some creative "cloud cartography"[1], you have a 40% chance of launching a VM on the same physical host as your target.

[1] https://www.cs.cornell.edu/courses/cs6460/2011sp/papers/clou...


When he is saying

>Save yourself the trouble. Don't use cryptography. It is plutonium. There are millions of ways to mess it up and precious few ways of getting it right.

I am sure he means don't use your own homegrown cryptography solution. Use something established and well tested. Good advice, probably can't be repeated enough. I liked the exampled he gives, I didnt know for example that you can basically extend an MD5ed string and keep the original MD5 value. Then again, I know that MD5 isn't a secure cryptographic hash function so I wouldn't have used it from the start. Nice to know why thats the case.


Hi, author here, I'm glad you found it useful!

> I didnt know for example that you can basically extend an MD5ed string and keep the original MD5 value

Thank tptacek for that knowledge, I think I picked it that up in a talk he did that's somewhere on Vimeo (will link to it).

> Then again, I know that MD5 isn't a secure cryptographic hash function so I wouldn't have used it from the start. Nice to know why thats the case.

Nope! I was going to mention in this in the post but removed it to keep things simple: This particular vulnerability is not due to MD5 collisions or MD5 being cryptographically insecure. It's because of the internal mechanism (a "Merkel Damgard Construction") intrinsic to hash functions like MD5, SHA1, SHA256 and friends. Even if MD5 were cryptographically secure, this vulnerability would still present itself if used in the way I described.

Don't mean to nitpick, but with crypto small misunderstandings lead to big vulnerabilities. Hope that makes sense :)


> It's because of the internal mechanism (a "Merkel Damgard Construction") intrinsic to hash functions like MD5, SHA1, SHA256 and friends. Even if MD5 were cryptographically secure, this vulnerability would still present itself if used in the way I described.

Thanks for pointing that out, didn't know that.

Just for fun, if they had written message+secret instead of secret+message it would have been ok (although bad practice)?

calculated_mac = OpenSSL::Digest::MD5.hexdigest(message+secret)


Secret suffix MACs are insecure if your hash function isn't collision-resistent. To illustrate: MD5 isn't collision-resistent, but HMAC-MD5 has no currently known viable attacks, because it isn't simple a secret-suffix MAC.

So it's true that using a secret-suffix MAC is safer than using a secret-prefix MAC, but if you know enough to make that choice, you know enough to use HMAC.


The answer is no, but I forget the exact reason why. Will do a little more research and report back.


>This particular vulnerability is not due to MD5 collisions or MD5 being cryptographically insecure. It's because of the internal mechanism (a "Merkel Damgard Construction") intrinsic to hash functions like MD5, SHA1, SHA256 and friends.

That's what I like about SHA3. It only dumps part of its internal state so there is no possible way to resume from a hash. This also makes it viable as a PRNG or cipher stream.


Is there a repository or central location of the established and well tested solutions for developers to use? Where does one start?


Some ideas, depending on what you need (I can't say how good these are):

http://www.keyczar.org

http://nacl.cr.yp.to

https://github.com/jedisct1/libsodium

http://www.gnupg.org


Thanks. These are straight crypto frameworks which are a good start.


One of the problems with cryptography is that "solutions" are not as generally applicable as one might want. There are a lot of assumptions that are made, and violating those assumptions is usually a disaster.

For example, when it comes to encryption, it is typically assumed that all messages are the same length. If your application does not make this guarantee, encryption may not provide you with any security. You could wind up in this situation:

https://news.ycombinator.com/item?id=2661890

Think of it this way: asking for a list of tried-and-true crypto solutions is like asking for a list of tried-and-true database schemas. There might be cases where it will work, but for the most part you need to put some thought into what you are doing.


That makes sense in terms of no standard libraries. But how about design patterns for common scenarios, eg. securing REST apis, User authentication/Login/Signup, Payment information handling etc, all in one place.

Does it become less secure if everyone follows standard design patterns?


No, of course it does not become less secure. In the worst case, slapping some crypto onto a system would have no effect at all; it is hard to see how it could make things worse, other than to give people a false sense of security.

An example is Hushmail. Your mail is signed, it is encrypted, you're using the tried-and-true PGP...and the DEA can walk into court with a pile of DVDs full of the plaintext of some defendant's email. Hushmail is, at best, only marginally more secure than GMail.

So while we might come up with good practices for using cryptography, it is inevitable that organizational practices will render the cryptography pointless. Solutions need to be tailored to the specific needs of an organization or a system. That is where the real problem lies: we do not have something like SQL for cryptography. We do not have a good way to specify organization needs and design (or even verify the security of) a cryptosystem that meets those needs.


Wikipedia perhaps?


Doesn't matter if it's MD5 or SHA512. You can extend either hash function in the exact same way (http://en.wikipedia.org/wiki/Length_extension_attack)


It does matter what hash function you use, though. MD5 and SHA-512 are both vulnerable to length extension, but there are other hash functions that are not, like SHA-512/256, SHA-3 or BLAKE.

The documentation for both Keccak and BLAKE2 recommend prefixing a fixed-length key to the message to do MAC, pretty much like in the vulnerable example, but with a better hash function.


> other hash functions that are not, like SHA-512/256

I think you meant SHA-224/384. Both SHA-512 and -256 are vulnerable to length extension because their internal state is dumped and resumable. With SHA-224/384, you only get a truncated state (from 256- and 512-bits respectively), which you can't pick up and resume.


I do mean the hash function "SHA-512/256", as defined in FIPS 180-4 [1]. It is basically a version of SHA-512 that truncates the final result to 256 bits (Like SHA-384). It is not vulnerable to length extension, because unlike SHA-256, the final hash does not contain enough state to continue hashing.

I wouldn't consider SHA-224 immune to length extension since it only truncates 32 bits, which is low enough to brute force.

[1] http://csrc.nist.gov/publications/drafts/fips180-4/Draft-FIP...


Ahh, quite right. Sorry about that. And yeah, I agree on SHA-224; it's okay in certain circumstances (you're rarely going to be able to pull off 2^31 (average) requests), but it's almost definitely not the right choice.


About the timing attack on HMAC that the article mentions. It takes thousands if not hundreds of thousands of requests to gather the data (and let's assume you can indeed extract the data out of all the noise cause by network latency etc...), and any properly designed API should have a throttle measure built in to prevent brute force attacks like this.

A good, secure API is protected by a variety of measures, not just through the request authentication component. Of course you should still make your auth as secure as possible.


"a throttle measure built in"

It's my understanding that bittorrent sync uses the latency of the network for such a throttle. Wild hair: a cyclotron-style router roundabout could hold millions of packets "in suspension" for n seconds.


> This means that as long as you have one example of a signed message, you can forge signatures for that message plus any arbitrary request parameters you like and they will authenticate under the above described scheme.

If all requests are made over HTTPS, how could a third party intercept a signed message? How is this any greater of a risk than a third party intercepting user login information? (This is a serious question; I'm not being flippant or saying 'gotcha')


If you do not manage SSL certificate properly (trust only your server's certificate), a MITM attack can break the system easily. Otherwise I don't think you can intercept the signed message if everything is over TLS.


Right. If the attack vector is "break SSL" I'm going to try some other attacks first. There's an underlying assumption in the question: my app (and everything else hosted on the box) is safe from XSS, CSRF, injections, and other information leakages. Is it really? How do I know for sure?

And who's to say that your forum server (for example) is just as secure? That could be a foothold into your environment too. And let's not forget social manipulation of your staff and users. Maybe I'll just steal the machine in question, or your laptop.

After I try all those avenues, I'm either finding another target or ramping up for a protracted attack on your SSL connections.

If your site attracts this dedicated of an attack, you'd better get that high paid security consultant. ;)


The problem with these attacks on SSL is that they're not protracted; They're trivial. Python's httplib doesn't check SSL certificates at all by default, for example, so you just hijack the TCP connection, negotiate SSL, and then you're done.

With libcurl, I think you have to set CURLOPT_SSL_VERIFYHOST to 2. If you set it to TRUE (i.e. 1), it skips part of the certificate check, rendering the whole thing trivially insecure.

Most (all?) crypto libraries have terrible APIs, or have APIs that are far too low-level to be safely used by most developers. SSL shouldn't be the easiest thing to attack, but in the current state of affairs, it often is.


Is the risk any greater with APIs (like described in the article) compared to typical username/password login systems?


A lot of API authentication is half-assed, like the examples in the article. "OAuth is hard, roll your own" is a common approach. Even with, e.g., OAuth 2, who's to say that the scheme is completely safe and that your implementation is correct?

As far as API vs. user account, it depends on the loot. An API might let me do more damage faster, or subtly lurk and alter/steal data over time. It might also be harder to detect from the UI, no "last logged in" giveaway.

Also, some API vendors recommend disabling SSL cert validation client side. Even for credit card gateways, unbelievably. Since it's a script talking to a script, no one is going to see the cert problems from a MITM until it's too late.


Saying stuff is hard is easy, and as far as I can tell, very correct about cryptography.

More useful would be actual suggestions to enable average developers to do more things with some degree of confidence. In other words, libraries, recipes and other documentation, and so on.

"Hire an expensive security guy" is probably not a feasible solution for many small startups, nor very scalable in any case.

Also, out of curiosity, in his timing attack example, the difference in time caused by the string being equal seems like it'd get absolutely swallowed up by the random nature of the universe - do those things actually work in the real world, on real servers with varying loads and numbers of users and network traffic?


Hi, author here. I would have liked to have made useful suggestions, but having made so many errors with cryptography in the past, I don't really feel qualified to do so.

I do however recommend getting in touch with tptacek and co and doing the crypto challenges. I mentioned this at the bottom of the post.

Edit: re: timing attacks, it helps if you're in the same data centre. I've never personally implemented a timing attack, so I'm hesitant to make any assertions further than that either way.


btw: isn't the scheme you posted vulnerable to replay attacks too? Much easier than the timing attack.

edited to add: don't mean this as a nitpick. I've seen that very mistake made by two S&P 500 companies that had 'homebrew' SSO we had to integrate with.


> btw: isn't the scheme you posted vulnerable to replay attacks too? Much easier than the timing attack.

Yep, probably. In fact now that I look at it definitely, I should have gone with that instead!

I forget who said this but basically any feature that exists in TLS that doesn't exist in your hand-rolled authentication scheme is a vulnerability.


Does peer review count as a feature?


I am doing this. It looks fun.


>Also, out of curiosity, in his timing attack example, the difference in time caused by the string being equal seems like it'd get absolutely swallowed up by the random nature of the universe - do those things actually work in the real world, on real servers with varying loads and numbers of users and network traffic?

You could make each request many times, and then average them together. I don't know how many requests you'd have to make to overcome the random fluctuations though -- probably a lot.


Also, this random behavior will be significantly amplified if the host is running inside of a VM.


Unless I'm missing something, the example code will just calculate the checksum of "key:value&key:value&key:value", so actually anything with the same number of parameters will pass.

Assuming "key:value" is supposed to read "#{key}:#{value}", this may be vulnerable to a delimiter attack -- you couldn't tell the difference between {foo: 'bar', bar: 'baz'} and {foo: 'bar&bar:baz'}.


You are of course correct. Thanks! Will fix and rollout once traffic from HN/reddit dies down.


(Un)conscious (in)competence reminds me of Donald Rumsfeld's trichotomy of known knowns, known unknowns and unknown unknowns.

(highlighted here: http://www.youtube.com/watch?v=2Ex8EEv-WPs)


At $work we regularly have trainees who learn just a tiny bit of programming. We teach them not to touch cryptography if at all possible. And I don't touch it either.

When we do web development, the admins take care that the application runs over https, and authentication is handled by some LDAP thingy in the web server. The only other cryptography we use is ssh, and signing submissions to the RIPE database with PGP, and stuff transparently done by some database client libraries.

It seems to be a sweet spot where we're still flexible enough for what we do, and don't present too much attack surface.


I didn't know about Matasano's Crypto challenges. Looks like fun, going to give it a try!

http://www.matasano.com/articles/crypto-challenges/


I'm just curious: the obvious response to the first problem (message extension attack) is to include the message length in the message. If this wouldn't work, could an expert please tell me why?

Edit: To be specific, I was thinking of a Pascal-style length-at-start arrangement. It's clear that if you put the length at the end you may be vulnerable to exactly the same attack!


I'm not an expert, but I can see a problem with schemes along those lines. MD5 (and similar) extension attacks also take advantage of the padding added after the thing you want to hash; so eg

Secret: secret Message: foo&bar Let's include the length and hash 'secretfoo&bar13'

What you actually hash is 'secretfoo&bar130x80...padding...0x20'. What the attacker sees as the 'known hash' includes this padding. When you perform an extension attack, the attack message would look like: secretfoo&bar130x80...padding...0x02&evil . At this point you're adding on the length and hashing it, so what you compare with is 'secretfoo&bar130x80...padding...0x02&evil93' ... but the attacker can precompute that hash just as easily as the one with just 'evil' appended.

You could argue that your hashes would be constructed a bit differently (eg make messages that appear to have internal padding illegal; still vulnerable, btw), but the point is - you're still tinkering with the same basic structure, and are likely to make more mistakes. You're better off switching to HMAC for this application.


MD4, MD5, SHA1, and all the SHA2 variants use a structure called Merkle Damgard, which does include the message size. Most are vulnerable to length extension attacks. SHA3 (Ketchup) is not vulnerable to length extension attacks, because resistance to length extension was a design criteria for the SHA3 contest.

Edit: Doh, SHA2-224 and SHA2-384 are truncated and aren't vulnerable


Thanks, I looked up Merkle Damgard now understand the comments about padding and how they would affect things.

But I was specifically thinking of the following scheme:

Message: [length][key-value pairs]

Hash: MD5 (shared secret, message)

Assuming that the rule is: the length is at the start of the message, and only "length" bytes of the message are used after the hash check passes.

It seems that if you have found a hash collision, Merkle Damgard lets you find more collisions easily. But the exploit in the article doesn't require a hash collision. If you haven't found a hash collision, would the above scheme be vulnerable to length extension?


The attack has nothing to do with hash collisions; it has to do with the fact that the MD-structured hashes spit out their entire state at the end of the operation, which means an attacker can simply reformat the hash back into the hash core's state and continue hashing with it.


I didn't think so, but I mentioned it because the Wikipedia article on Merkle-Damgård hashes (http://en.wikipedia.org/wiki/Merkle–Damgård_construction) talks about length extension attacks only in the context of hash collisions: "Length extension — once an attacker has one collision, he can find more very cheaply."


That's actually a relatively new finding, and a very cool attack (I assume we're talking about Joux multicollisions).


Multi-collisions is another thing. What Wikipedia means is that once you find m and m' s.t. H(m) = H(m'), then you've also found the collision H(m || X) = H(m' || X) for whatever X, i.e., unlimited collisions.


The attacker just needs to include the updated length, which he can likely calculate since the shared secret is probably the same length for all accounts-even if it is not he only needs it for this request--and he knows the length of the parameters and their values. It would take some guesswork and trial and error to nail it down, and the attacker would have to guess that the length was included in the first place. Assuming he has plenty of time and processing power it is not infeasible. There are a lot of fairly easy things you could add to make it more difficult to attack.


[deleted]


The attacker doesn't "update" the length. The attacker appends data to the message, which has the effect of turning the previously stamped message length into just another bunch of bits in the middle of the message, and then includes their own message length at the end. The final message must accurately represent the length of the original message including the shared secret, so there's trial and error involved, but it's just trials of "possible length of the shared secret".

Again, this works on MD4, MD5, SHA1, and all the SHA2's.

You shouldn't have deleted your comment. You're not expected to know this stuff. Very few people, relative to the whole industry, or even relative to the number of programmers who end up trying to build crypto, actually do know how a length extension attack works.


Yes, sorry for deleting a comment when there was a reply! For posterity, it was something like: "it seems like you would need to break MD5 in order to do this, because you'd have to update the length."


The thing with cryptography is that you have to get it right every time. The person trying to crack in only has to get it right once.


I have implemented an MD5-based scheme similar to what was described. At the time, MD5/hash extension attacks were not as well-known as they are today, at least I had not heard of them and I read up on MD5 before I designed the sytem, so another worry is that what is considered secure today might not be secure tomorrow.


> another worry is that what is considered secure today might not be secure tomorrow

If history is anything to go by then what is secure today will definitely be insecure tomorrow. No real solution to this AFAIK. If vulnerabilities in the crypto itself doesn't get you eventually, quantum computing will.


Quantum computers can only solve certain types of problems efficiently. Hashing is not generally in that set.


Then there is the question; Is your app likely to ever be attacked at all? By someone good? With a lot of time?

If not, perhaps hiring a cryptography expert is overkill, and you are better off just using whatever your framework offers, or integrate some library yourself. You are not a bank.


The message extension attack is pretty well known by crackers, which could leave your webapp open to pre-scripted kit attacks that simply crawl the web.


This is a very dangerous question. Thinking like this leads to huge security vulnerabilities in important systems. A better question is "What is the worst thing that could possibly happen if this system were successfully attacked?" If the answer is anything bad, it is probably best to assume someone will try to attack it.

As for the second part of the question, are we talking about the same internet? Because the one I'm on appears to be full of technically savvy people with loads of free time.


I completely agree with this and have just released a session-cache for python that completely ignores encryption and just stores uuid in a cookie, and relies on server side lookups for session work.


Make sure you are using HTTPS exclusively so you aren't vulnerable to Firesheep-style attacks.


That's in the docs :-)

Ooops no it's not - kind of assumed it was obvious - thank you for the reminder :-)


That's in the docs :-)


How is the shared secret generated, stored and shared? (There are lots of ways to do this wrong, and the code and algorithm for that were not even shown.)


For the timing-based one, you I would've made the server wait a small random amount of time during the process. The xor is more clever, though.


Not the way a naive reading of your post suggests, no.

If you add a random amount of time, the timing attack still works; it averages out, the same way network jitter does. What you need to do is make every request take the same amount of time; randomness is not helpful here.


I guess I've unintentionally helped prove the article's point, then!



Hmmm, one of the things I've noticed is that if you've studied pre-web cryptography extensively (as I did, starting even before I started programming), "Everything you know is wrong". Or thereabouts.

The HMAC timing attack illustrates this well, "classical" systems as I will call them, e.g. Enigma and how it was used, are not subject to zillions of known plaintext attacks (although I've read one bit of known plaintext from an out of the way base ("I have nothing to report"...) regularly helped the Enigma decryptors when the key changed).


Few thoughts - why MD5 in the example at all? I thought that SHA-2 was the go to for hash functions nowadays.

And a question on the timing attack - how real it is? Because in the response time you have a lot of random variables which with the current fast servers will take more time than the calculation itself.

You receive request - how fast it will go trough the loadbalancer is random, then you must read the shared secret, then you must log the request somewhere and the whole hmac calculation is milisecond or less.


State of the art multiple years ago was 100ns over LAN (aka datacenter). It's something to worry about. Honestly, I'd be extra paranoid and use a construct that hashes the supplied digest an extra time on top of doing a timing-safe comparison.


So if the attacker can make a million attempts, don't you think that if the hmac calculation has the most variance of all the other moving parts, that it wont give up the secret?


With thousands of tries, you can apparently average the noise out pretty effectively, so timing attacks are very possible.


SHA2 has the same problems.


To be more accurate, SHA-224, SHA-512/384, and SHA-512/256 don't suffer from length extension attacks. The rest of the SHA2 family (256 and 512) do. Of course, SHA-256 and SHA-512 are more popular than their truncated variants...


Very good point. I would extend it to security in general.

Security is like medicine: do not try it yourself, unless you are a specialist or want to end up dead.

Too bad a lot of developers still do not understand this and remain too self-confident, from what I personally have seen.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: