Hacker News new | past | comments | ask | show | jobs | submit login
Don’t Hash Secrets (benlog.com)
110 points by davidblair on Jan 28, 2010 | hide | past | favorite | 46 comments



The broader take home message of the article: "Don't invent security protocols (e.g., involving hashing) as there can be surprising weaknesses. Find a well-designed library routine that already does what you want."


"Don't do crypto yourself."


True, but I believe the point is still valid. Given knowledge of cryptographic hashes, and ignorance of HMAC, it is a perfectly reasonable assumption to use a hash in this way. I think that's a rather common situation, and outcome.

And, blindingly obvious in hindsight (I have definitely heard of HMAC before, at least) but new to me too.


There is no generalist developer in the world that will come up with HMAC when given the "secure hash algorithm" tool and the knowledge that you can "sign" or "protect" a message concatenating it with a key. HMAC is surprisingly intricate.

That there is no simple, intuitive alternative to HMAC is a strong reason not to allow developers to work at the "picking algorithms and constructions" level of cryptography.

The rule is, "data in motion should be secured with TLS, data at rest should be secured with PGP".

What worried me about posts like this is that they communicate the idea that, if you just know a couple more things (like, "HMAC"), you can get crypto right. No. For instance, this article doesn't explain how to safely compare HMACs. The intuitive way to compare securely is drastically insecure. The correct way to do it is surprisingly intricate.

If you're typing the letters H-M-A-C into your code, you're doing it wrong.


The rule is, "data in motion should be secured with TLS, data at rest should be secured with PGP".

And even that attitude can be dangerous. A while back I was getting depressed at the usability of tarsnap so I began to think about how I would design some secure backup system on Windows. "Ok," I thought, "I'll build stuff on top of GPG." The trouble is, any way I imagined doing this, even taking out features like deduplication, I would have leaked the filesizes of every file backed up. If the FBI wanted to scan for anybody who had some popular illegal porn collection or whatnot, they could tell just by looking or a certain distribution of chunk sizes. And anybody listening on the network would be able to get a weak idea of the filesizes just based on timing. I have no idea what Tarsnap does about this; I am guessing the blocks it hashes for deduplication purposes aren't the same as the ones it sends to the server, and I presume it doesn't stream stuff to the server as it scans the filesystem. And there are probably several other things it does that I wouldn't even think of.

(But even then if some government found that a protest movement was distributing USB sticks with 3.167 GB of information after it gets zipped for backup, it could just look for people uploading that approximate amount of information to the backup server. It seems like people who have a true reason to be paranoid simply shouldn't do online backups.)

I would change the rule to "Data in motion should be secured with TLS, data at rest should be secured with PGP, and don't even think about doing anything weird."


A while back I was getting depressed at the usability of tarsnap

Please send me an email with details about the usability issues you encountered.

[...] distribution of chunk sizes [...] I have no idea what Tarsnap does about this; I am guessing the blocks it hashes for deduplication purposes aren't the same as the ones it sends to the server, and I presume it doesn't stream stuff to the server as it scans the filesystem. And there are probably several other things it does that I wouldn't even think of.

The blocks Tarsnap uses for deduplication are the same as the blocks Tarsnap uploads; but they are encrypted (to avoid leaking file data), identified by HMAC (to avoid leaking information via hash), and the blocking is key dependent, so an attacker can't say "you have chunks of lengths X, Y, and Z, so you probably have this particular block of data". Also, the chunk size (64 kB on average, pre-compression) is small enough that there would be too much noise to pull off such an attack.


Not to mention, the article mentions using HMAC for his password databases. I am not sure how this is any better than salting. His argument against salting was that the salt is going to be stored somewhere. Well, the key he is going to use with HMAC is going to be stored somewhere too. In general, hashed password storage is a tough game to win in an absolute sense. So, we are effectively left with slowing the attacker down to the point that it isn't worth their while. I didn't see any mention of using iterative hashing (aka stretching) to increase the time required by an attacker to brute-force passwords. I agree with your "TLS in motion and PGP at rest" quote, though I don't know if we have a similar thing we can point to for password storage. Is there a go-to library we can just say "hey, use this". I know we can use bcrypt, or do iterative hashing, etc. But, I don't know off the top of my head what stock library we should tell folks to use so they minimize the chance of shooting themselves in the foot with password storage. Maybe this is something that should be added to keyczar.


Salted hashes are not that much better than unsalted hashes, but HMAC itself isn't meaningfully different than just using a salted hash. You have to know the key to complete the hash, just like you have to know the salt in a salted scheme.

You know you're getting into trouble with someone's understanding of systems security when you start talking about "secret salts".


> Salted hashes are not that much better than unsalted hashes...

This statement made my eyebrows go up, and I pretty closely follow everything you say here on HN, and a lot of the things you say on Matasano.

Are you saying "not much better" in the relative sense, as in, "not much better because if all you're doing is salting, then you're still doing it wrong"? Or more absolutely, as in, "salting doesn't gain you anything"?


For web app passwords, brute force attacks against salted hash files are going to compromise so many passwords that the net effect to your business is going to be bad.

With bcrypted or even PBKDF'd passwords, you'll at least have a plausible response to a compromise, and some certainty that hundreds of your users passwords aren't going to be posted to a web page.


Sorry, I still don't understand. Does this assume that all passwords share the same salt, so, not a nonce? Because if each hash has a nonce, I didn't know there was a fast way to brute force a database of those.

(Assuming that the developer isn't using md5, of course, but even then I wouldn't expect it to yield a lot of results very quickly.)


Sigh.

1. Take a hash.

2. Start at "aardvark".

3. SHA1 the salt + aadvark.

4. Compare to the hash.

5. Repeat until "zymoscope".

It's so weird to me that rainbow tables have so captured people's imaginations that they can't comprehend other attacks to hashes, despite the fact that the brute-force attack I just described was the only way password files were ever cracked for the entirety of the 1990s. It's plenty fast.


That attitude was uncalled-for.

I'm aware of how brute force attacks work, thank you; and when you say "it's plenty fast", I'd guess you're referring to the difference between "slow" hashes -- like bcrypt -- or "fast" ones, like MD5.

...Which has nothing to do with whether salts are a good idea or not, which was actually my question.

Furthermore, your example there assumes a dictionary to throw at the hashes, which is stupid, because if the website (or other application) doesn't have a sane password policy in the first place, then you don't even need their database of hashes. So, the whole "rainbow table" example you just sighed about was completely irrelevant to my question.

Maybe I can put this another way that'll make more sense.

Let's say I have a 150,000 computer botnet -- pretty small by today's standards, and not that hard to achieve really.

And let's also say that I have a database of 10 million accounts that I snatched from some business, and better yet, let's say that they don't know about it, because I was uncommonly smart and I didn't brag that I got it.

Now, I want to retrieve the passwords for as many of those 10 million accounts as possible, using my botnet.

Further, let's say that the company was uncommonly smart, and they twofish'd all of their passwords with enough stretching to piss me off.

How much more pissed-off will I be when I discover that they've concatenated a 128-bit randomly generated nonce to each password?

Because my guess is, a lot more pissed off, since I can no longer distribute my dictionary of 500,000 commonly found passwords to the botnet, run it through their pessimal twofish, and then look for collisions.


I'm confused but happy, because your "pessimized Twofish" is basically bcrypt, which is inherently "salted".


> In general, hashed password storage is a tough game to win in an absolute sense. So, we are effectively left with slowing the attacker down to the point that it isn't worth their while.

And? Public key cryptography is impossible in an absolute sense, because there are algorithms that compute the private key given the public key. In fact, we have a good definition of "perfect secrecy", but the only cryptosystems that meet that definition must have keys that are as long as the message, which is impractical. Slowing the adversary down is how practical cryptography works.


It's a fallacy to compare the computational bounds of encryption algorithms to the human failures that break encryption systems.


>The rule is, "data in motion should be secured with TLS, data at rest should be secured with PGP".

No. The rule is, "ask people who know about security and cryptography to help you solve your problem".

That's what we do, and I can tell you that our guidelines are much more complex than simply using TLS or PGP. Actually I think we don't really have guidelines.

It's very easy to use TLS and PGP the wrong way.


I just TAed an introductory security class. The crypto project involved building a secure channel, and messages were authenticated with HMAC. We authenticated E(msg) || HMAC-SHA-256(msg) by computing HMAC-SHA-256(D(E(msg)) and bytewise comparing it against the sent HMAC-SHA-256(msg). This was based on Practical Cryptography and seemed fairly intuitive to me. How is it drastically insecure? (I fully admit that I don't know practical crypto, hence the conformance to Practical Cryptography.)


Bytewise comparison is timeable. Your comparison needs to touch every byte of both the candidate and the real MAC.


and that's convertible into a fast attack on the MAC key, even through network delay and jitter? (I wouldn't be surprised, now that you've pointed it out.)


There's a fantastic paper Nate showed me that I need to dig up that established some bounds for timing attacks using statistical filtes. Long story short: you can time low microseconds granularity over the Internet, and you can time nanoseconds over a LAN.

Memcmp is just on the threshold of Internet-timeable. And that's memcmp, which screams.

But Internet-timeable is irrelevant, because anyone who wants to time your app is just going to get an account at the same hosting provider as you and wind up a GigE hop or two away. Hosting on Slicehost, EC2, Linode, or GAE? That step took 5 minutes and $20. I'd pay $20 to bust up an app I hadn't even heard of, let alone a popular app.


http://codahale.com/a-lesson-in-timing-attacks/

Suggests 20us over internet, 100ns over lan. Are there more complexities to comparing HMACs than are mentioned in this article? I.e. anything else to think about other than not short-circuiting your comparisons when bytes don't match?

More discussion (including from someone called Nate, presumably the same person tptacek is referring to) at:

http://groups.google.com/group/keyczar-discuss/browse_thread...

And a paper which I can't read without paying:

http://www.computer.org/portal/web/csdl/doi/10.1109/MSP.2009...


Where does Slide or RockYou host?


The rule is, "data in motion should be secured with TLS, data at rest should be secured with PGP".

Isn't this more of a heartfelt wish than a rule, though? It's restrictive enough to be impractical, I would guess even for security researchers. Do you use SSH to access remote machines?


If your problem doesn't fit into PGP or TLS, refactor your problem.

There are high-level crypto libraries (though none that I recommend without hesitation) that provide essentially the same features as PGP. If you're going to make a concession, perhaps that's the one you could consider. However:

* I don't recommend doing so, and

* It's not that much of a concession, because you still have to wrap your application around the cryptosystem, not the other way around.


Quoth the article:

That means that if you know SHA1(secret || message), then you can compute SHA1(secret || message || ANYTHING), which is a valid signature for message || ANYTHING. So to break this system, you just need to see one signature from SuperAnnoyingPoke, and then you can impersonate SuperAnnoyingPoke for lots of other messages.

This is absolutely incorrect. If you know SHA1(secret || message), you can compute SHA1(secret || message || padding || anything), which is quite a different matter. Most useful values of "anything" don't start with such padding.

Should anyone be designing a system which relies on this distinction for its security? Of course not. But things are bad enough already without exaggerating the scope of attacks.


You mean useful values of "anything" not including URLs, HTTP cookies, unicode comma-seperated lists, and unicode key=value ASCII pairs, where a small amount of padding isn't going to break anything.


I don't know about you, but I generally don't allow NUL bytes in my URLs, HTTP cookies, unicode comma-separated lists, or unicode key=value ASCII pairs.


That's because you're a C programmer. Tarnsap may not have this problem, but Flickr definitely did. I don't know why you think it's a good idea to talk this problem down; it has already been catastrophically bad for some applications.


I'm not trying to talk the issue down. I'm just saying that it's bad enough already without exaggerating it further.


Well I generally don't disallow NUL bytes.


This is what you really want if you care a lot about avoiding the small domain problem (brute forcing H(myguess) so that I finally find H(myguess) == password_hash):

Just use an algorithm that is acceptably slow when used to authenticate users, but unacceptably slow when using for brute forcing.

For instance

HMAC(1,HMAC(2,HMAC(3,MAC(4,HMAC(..N,HMAC("yourPublicZZaaalt","yourpassword"))))))

Use N big enough so that you need a few milliseconds to run this code as optimized C.

If you are smart, design it so that CUDA wont help.

Now you are done. Brute forcing will take just too much if the password is not too obvious. It will be still possible to test 10000 hashes in a reasonable time maybe, but forget to run John The Ripper against it.


Don't use HMAC, but, sure: iterate SHA1 a couple thousand times. This is the approach taken by PBKDF.

A better approach is to use bcrypt or scrypt. It's not a security objective of SHA1 or HMAC-SHA1 to take a long time to run --- in fact, they have the opposite design goal. Bcrypt and scrypt have both been designed to be "cryptographically slow": speeding them up in the general case would (forgive an oversimplification) require a new result in cryptography that would be more significant than "optimizing brute force cracking".


I proposed HMAC just in order to avoid that there is some strange property about H(H(H(H(H(...))) probably not true for SHA1 and many other believed secure hash functions and block cyphers.

Btw another slow thing that can be used is Blowfish, as it has a slow key scheduling stage, but don't know if it's slow enough.

Another approach can be to take any Feistel Network based block cypher and turn it from M rounds to M*N rounds.

There are many ways to turn a secure cryptographic primitive into a monkey asses slow cryptographic primitive that seriously limits the effectiveness of brute force.


Bcrypt is a pessimized version of Blowfish. If you're working at this level of granularity, it should be for a research paper, not something people actually use.


Ah was not familar with "Bcrypt" thank you very much for the info. I'll look a bit better into it.


The attack isn't using brute force. The defense is against rainbow tables, not brute force.


I feel like this should be titled "Don't hash secrets, unless you actually know what you are doing."


Actually he suggests to use something that he has no clue what it's doing. :)


Hard to make it past 3 sentences of the arrogant writing style. Such arrogance usually indicates a lack of expertise. The substance of the article confirms that.

Yes, it's like "Don't Hash Unless You Really Understand That I'm the Only One Who Really Understands"


Another post on this topic, with links to research papers: http://rdist.root.org/2009/10/29/stop-using-unsafe-keyed-has...


Good god, no. Here is much better advice, in much shorter form:

If you're storing a password, use bcrypt or I'll bite your thumbs off.

If you're making sure something hasn't been modified by someone and you absolutely can't use a signed GPG message, use HMAC-SHA256 and a constant-time comparison algorithm or I'll bite your thumbs off.

All the business about hash functions is like advice on how to build your own car brakes using a backyard smelter and a sand cast. You might get something of roughly the right shape and weight but someone's going to get hurt down the road.


Such articles leave me with the feeling that there are probably a couple hundred people on earth who truly understand cryptography. The rest of us either blindly follow them (good thing) or try to understand and apply the algorithms ourselves (very bad thing). I'd love to see some sort of crypto-recipes.org site: with a listing for use-cases, recipes underneath and implementations in different languages. I'm sure crypto experts will look down on this, but I believe it would greatly benefit the community.


I disagree on that conception...somehow. Can someone explain? http://securecoding.ch/?p=201001290042267


The article says not to hash shared secrets, it does not argue against hashing passwords (with salts).


Oh. It seemed to me the use cases pretty talked about password hashing.

I guess I should re-read :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: