One way to fix your rubbish password database

healsdata · on June 8, 2012

Is there a security disadvantage to taking the MD5 hashes you already have and running those through bcrypt? It seems like that would let you get to a salted bcrypt implementation in one day as opposed to waiting for all your users to log in. Perhaps you could do the mix (md5 + bcrypt) until the user logs in and then switch them solely to bcrypt?

tptacek · on June 8, 2012

No. I don't believe there's any disadvantage to this. An MD5 hash is a 128 bit random number; it's 16 fully random characters, better than almost any human password.

tedunangst · on June 8, 2012

Pedantically, it's 16 random bytes. You aren't going to lose much entropy from an upper/lower/number password unless it's at least 20 characters, or even longer if it's a pass phrase.

tptacek · on June 8, 2012

Sorry; C programmer.

stavros · on June 8, 2012

What's your opinion on SuperGenPass and the like? I'm sure it would still get cracked, but I feel that it would take a lot of effort to crack the provider's hash, even if it were md5, and then less effort to crack the SuperGenPass hash, so hopefully nobody would recognise it or bother...

simonbrown · on June 8, 2012

Not entirely on-topic but SuperGenPass and probably similar bookmarklets has a security problem:

http://akibjorklund.com/2009/supergenpass-is-not-that-secure

stavros · on June 8, 2012

Ah, I hadn't considered that... That's a shame...

icebraining · on June 8, 2012

Just wanted to point out that any implementation as a browser extension (as opposed to bookmarklet) is safe from DOM manipulation; searching on Google for "supergenpass extension" returns results for at least Chrome, Firefox and Opera.

stavros · on June 10, 2012

I'm not so sure, the extension still appends DOM elements, I'm sure those are just as susceptible to sniffing...

simonbrown · on June 10, 2012

I haven't looked at the code or even used it that much, but it seems like it only uses content scripts to insert the password into the field, and everything else is dealt with by the popup/background page, which websites don't have access to.

lukeschlather · on June 8, 2012

An MD5 hash is not a random number; it is generated from some text string. It's possible that bcrypt(salt + MD5(text)) opens you up to collision attacks that are not possible with bcrypt(salt + text). It seems unlikely that it would open you up to attacks that are not possible with md5(text) but MD5 is not a random number so I'm not too sure.

harshreality · on June 8, 2012

The best means of colliding MD5 seems to require one collision block plus some extra "birthday" bits, all of which are controllable by the attacker. [1]

The idea is that you have two messages, m1 and m2, or m1 and m1' if you prefer, and you vary bits in both until you get a collision. You need some area of m1 and m2 that doesn't matter for the application, so that you can change those bits and find a collision. Since all bits of m1 are supplied when entering the password, you have no ability to modify it without getting the user to change his/her password.

If you could collide any arbitrary m1 as it's given to you, then attacks like fake certs with signed MD5 hashes could create the fake cert after submitting it to the CA and getting the signed cert back, rather than before.

Also, the collision process requires knowledge of m1 so you can see the intermediate hash states. If you know m1, the password/passphrase, why are you trying to find a new m1' that hashes to the same value rather than using the pass you already know?

An attack of concern for using MD5 as a password hashing step would be a first preimage attack. [2]

[1] https://www.google.com/search?q=md5+collision+block+birthday... (first link at present is http://www.win.tue.nl/hashclash/SingleBlock/ )

[2] http://en.wikipedia.org/wiki/Preimage_attack

joshuahedlund · on June 8, 2012

I'm still trying to learn this stuff, but I do not understand fundamentally how bcrypt(salt + MD5(text)) could be worse than bcrypt(salt + text). What if everyone's plaintext password was already a string of characters identical to some MD5(text)? If bcrypt(salt + MD5(text)) could be bad, then doesn't that mean bcrypt(salt + text) could be bad too?

lmkg · on June 8, 2012

If you compose hash functions, you get the union of possible collisions.

Let's say that "foo" and "bar" are two distinct passwords that have the same MD5 hash. Then bcrypt(md5("foo")) == bcrypt(md5("bar")), regardless of how bcrypt("foo") compares to bcrypt("bar"). By pre-hashing with MD5, you have added possible collisions that weren't there previously, and those collisions remain regardless of how many more hashes you pile on top.

chc · on June 8, 2012

We're not pre-hashing with MD5. The MD5 was already there. It's the only source text we have. The proper comparison here isn't MD5+bcrypt vs. just bcrypt — it's MD5+bcrypt vs. just MD5. So any collisions that MD5 causes are immaterial — they'd be there either way.

It seems to me that the most obvious problem is that you get two chances at colliding — once with MD5 and once with bcrypt. But bcrypt is not known to be especially vulnerable to collision attacks, so this setup is probably not noticeably worse than MD5 alone. But that's just looking at probabilities — I ain't no fancy crypto expert or nothin', so there might be much more subtle vulnerabilities than the added chance of collision.

lmkg · on June 8, 2012

> It seems to me that the most obvious problem is that you get two chances at colliding

Yeah, that's all I'm saying. I was answering a question about being "fundamental worse," and fundamentally, there are now two sources of potential collisions instead of one. In theory, that's twice as insecure! However, the practical effect is unlikely to rise above absolute nil anytime soon.

tedunangst · on June 8, 2012

Let's say that "foo" and "bar" are two distinct passwords that have the same MD5 hash.

As a practical matter, we can basically say that never happens. Certainly not for passwords that are user selected and not designed to collide. And since the hash itself is hidden by bcrypt, the attacker won't know md5("foo") even if they were inclined to find a "bar" with the same hash.

tptacek · on June 8, 2012

Can you cite a single example Of a pair of password-length strings that could be entered on a standard keyboard that collide in MD5?

Cryptographic pseudorandom number generators also collide and produce cycles. MD5 hashes of arbitrary ASCII strings are reasonably modeled as random numbers, and the concern you cite is unmeaningful.

pbhjpbhj · on June 9, 2012

That seems like a big ask, these are the closest I could find - http://www2.mat.dtu.dk/people/S.Thomsen/wangmd5/samples.html.

An upper limit on string length suddenly makes some sense.

tptacek · on June 10, 2012

Huh? Because some unsuspecting user might choose a password so long and so random that it turns out to be collidable with one other string?

pbhjpbhj · on June 10, 2012

I did say "some" sense. I was thinking choosing a string to collide and enter an account without it being apparent that you entered the account with anything other than the correct password. Alright the chances of someone wanting to do that seem slim but I can see someone saying in a meeting "if we leave the passphrase open at the max length side then people could enter a string with a matching hash".

abscondment · on June 8, 2012

I'm very open to correction, but won't the increased time complexity of bcrypt also mitigate this kind of attack?

pilsetnieks · on June 8, 2012

To nitpick a bit, it's 16 bytes, usually represented by 32 hexadecimal numbers.

tptacek · on June 8, 2012

The hexadecimal is just UI; ignore it.

16s · on June 8, 2012

Amen to that. It could be b64 encoded or whatever. Focus on the raw bytes. They are truth.

cheatercheater · on June 10, 2012

Red light coming up in the back of my head. Wouldn't the MD5->scrypt pipeline expose new attacks that scrypt doesn't have? Maybe there's a higher collision probability or some known-text attack, but I'm really shooting in the dark here.

tptacek · on June 10, 2012

cheatercheater · on June 11, 2012

That's one person whose "no" I'll accept without further explanations. Thanks for clarifying.

__alexs · on June 8, 2012

If you were on MD5 your passwords all have at most 128-bit of entropy. A user with a password of say, 30 random bytes from [a-ZA-Z0-9] will be getting some entropy truncated in the hashing process. If you now move to say, bcrypt your hashes are 448-bits long. So you are throwing a way a lot of the keyspace by using a shorter hash function first. So yes, it's weaker, in one sense, it's just massively more difficult to attack already and you can still upgrade to pure bcrypt as users log-in in future.

RyanMcGreal · on June 8, 2012

Thanks to reading lots of articles on HN about password security, I upgraded my site's passwords from salted MD4 to bcrypt about a year and a half ago (a bit late to the party, but still). Here's what I did:

I added a second password column to the users table, then ran a script that queried the table for the existing hash, generated a bcrypt hash from that value and wrote it into the new password column. Then I removed the old password column. No need to wait for the user to log in.

When people log in today, the code takes their password, runs it through the old hash routine, runs the output through the new hash routine, and compares that to the password on file.

Yeroc · on June 10, 2012

Unless I missed something you're describing exactly what the linked article suggested to do.

stavros · on June 8, 2012

That's what the article says.

healsdata · on June 8, 2012

Thanks for pointing that out. For some reason, when I read it earlier, I missed the bullet point about the conversion process.

stavros · on June 8, 2012

No problem, glad to have helped!

bmelton · on June 8, 2012

Well, I mean there's always the possibility that you'll have users that won't log in any time soon, and you don't want them to be vulnerable in the mean time, right?

If it were me doing it, I'd take the article's approach as the first whack, and then as each user logs in, validate their password (as per the article), and then also set their password to scrypt('salt', 'password') vs. scrypt('salt', md5('password')), so that they were current. Then just set a flag on the user record like "new_password=True" or something.

That gets you the stopgap without having to muck around with scrypted hashes forever. Send out a few emails to your user population with a note that you've upgraded your password strategy and that they should log in. You're still not going to get 100% coverage, but at least for whoever you don't get to log in, their passwords aren't 'in the wind', so to speak.

Edit: I'm not sure if you edited yours, or if I just read it poorly, but I think we're saying the same thing. Note, the article's strategy is basically your first scenario -- just bcrypting (or scrypting in the article) the existing md5 hash.

jcromartie · on June 8, 2012

This is a good step, but unfortunately all of the actual passwords are still out there, so they need to be changed.

I think a better idea would be to establish an easily implemented pattern for "password bankruptcy" that companies could follow in the case of a leak.

tedunangst · on June 8, 2012

The idea is you do this before the password database is stolen. It's too late for LinkedIn, but not too late for you.

ams6110 · on June 8, 2012

As far as you know.... but how do you know?

tedunangst · on June 8, 2012

Know what? Maybe the attacker is logging all the passwords that are entered. Maybe they installed a passwordless backdoor. Maybe they installed spyware on all your users' machines. There's very little point discussing all the imaginary attacks which may have already happened that you don't know about, that could be anything.

donpdonp · on June 8, 2012

What would a password bankruptcy pattern look like?

One thought is to invalidate all passwords and fall back on email password recovery when a login is attempted.

This leads me to an idea I've tried once - if access to the inbox is equivalent to password credentials, why not use an email to login? By this I mean the web site login is a single field - email address. The system emails a one-click-login URL to the user that can be re-used (possibly with a month expiration time). The user can look up the URL in their inbox when they want to login again, or use a long-lived cookie.

desas · on June 8, 2012

Emailing a link to login was one of two supported login methods for redhats mugshot social network. The other was sending the link via xmpp.

In practice I end up doing this for little used sites because I use either my phone, tablet, and two laptops for browsing the internet.

It's annoying if you work somewhere that doesn't allow access to personal email accounts and you want to log-in to something.

tedunangst · on June 8, 2012

I have lots of logins tied to email addresses no longer in use. As a real world example, people sign up for services with work emails. The day they get fired, they suddenly lose access to that email and all of the email login services tied to it. Not good.

rb2k_ · on June 8, 2012

Even if you run a bad codebase that just uses unsalted MD5 and you don't want to add a new crypto algorithm:

Couldn't you just run your whole database through X more rounds of MD5 and do the same in your authentication function?

That way, script kiddies couldn't use precomputed rainbow tables they downloaded somewhere off Bittorrent.

Each additional round will also reduce the speed of a brute force attack while still keeping the changes to the codebase will be pretty small.

Unless there are rainbow tables for a certain number of MD5 iterations, it would be a start...

IgorPartola · on June 8, 2012

I think the conventional wisdom is that you should not re-invent security. I am slowly learning this, but the give-away seems to be questions that start with "Couldn't you just..."

rb2k_ · on June 8, 2012

It also was conventional wisdom that banks were too big to fail ;)

Are there any actual arguments against using this as an 'easy' fix to the precomputed rainbow tables scenario? Multiple rounds of a cipher seem to be a relatively common operation in crypto and have helped other old ciphers. One of the more prominent ones would probably be the move from DES to triple DES.

I guess dictionary attacks on GPUs would still be easy enough, even with more iterations, but anything that isn't directly in a dictionary might benefit quite a bit from multiple iterations.

It's not as good as actually using proper crypto rather than hashing algorithms that were designed to be fast, but it seems like an easy to implement low-risk solution.

IgorPartola · on June 9, 2012

I am no security expert. I can't tell bcrypt from a hole in the ground. All I know is that it's all fun and games until there is a problem with this home-brew implementation and then it's too late. That is why I think it's best to avoid anything like this and instead go with bcrypt/scrypt, etc. and re-evaluate periodically based on latest industry standards. Perhaps an actual security expert on here can evaluate your idea. I seem to remember it being raised many times on here, so there may be an answer to this on one of the discussions of bcrypt vs scrypt or some such.

pbhjpbhj · on June 9, 2012

One would just check against a top10[000…] list of passwords on which various multiple hash combinations had been applied md5(md5(md5(md5('password')))) is going to be easy to reveal.

Your system would seem to be practical if you know there are no weak passwords or if you dont care if only some of the accounts are compromised.

You've also got to watch you don't DoS yourself.

heretohelp · on June 9, 2012

Pretty easily detected. This is a bad idea. Amateurs need to stop attempting their own patchwork solutions to their already bad fuck-ups.

rb2k_ · on June 9, 2012

Is the "easily detected" part really a problem? The idea isn't to try security by obscurity.

The main advantage is that it would still keep people from using precomputed rainbow tables and slow down brute force attacks with a minimum of additional code, wouldn't it? (similar to the switch from DES to triple DES back in the day)

frisco · on June 9, 2012

(Sorry rb2k_, I didn't mean to downvote you.)

heretohelp · on June 9, 2012

Rainbow tables are the least and most trivial of your problems to solve.

Using a fast hash algorithm for storing passwords is fucking braindead and a DOA decision to make about security.

Your "solution" doesn't solve anything.

ernesth · on June 8, 2012

Isn't the fact that s/bcrypt is by design costly preventing this idea from being executed?

tedunangst · on June 8, 2012

Assuming you use 0.01s per hash settings, you can upgrade somewhere around a million accounts per hour.

joshrice · on June 8, 2012

Sorta, but chances are you'll be generating way less then an attacker would need to do. More importantly, you owe it to your users to keep their passwords secure.

ww520 · on June 8, 2012

It's unlikely that you have more users than the enumeration of the 128-bit key space (if MD5 was used). The slowness of bcrypt prevents the brute force attack through the key space for EACH user. That's N x 2^128 for N users, whereas to upgrade, you only have to do it N times.

mjschultz · on June 8, 2012

I'll admit, I must be the only one that doesn't quite get the jump from step 4 to step 5.

In step 4, we make the assumption that their API is out in the wild, in use, and sends the md5(s, p) in the request. I get that we take that value, run it through scrypt and match against our stored value to authenticate. So the database has:

    scrypt(s', md5(s, p))

No problem authenticating the API requests with that.

Step 5 says once the user logs in with their actual password, we update entirely to the new scheme of scrypt(s'', p) and store just that. Now the database only has:

    scrypt(s'', p)

But the API user still sends md5(s, p) to authenticate, right?

So then what happens when that same user goes back to the API-using app? It's still uses the API so it'll send the MD5(s, p) and fail since we've discarded the transitional scrypt value when they logged in via the web interface.

Is there a deprecation period that supports both types while API using apps updated to a new API for the new scheme?

pbreit · on June 8, 2012

I had trouble following along as well. I think the gist is, 1) immediately re-compute strong hashes for all of your existing weak hashes. 2) when someone attempts to log in, try both hash computations. 3) if you used the old weak computation, re-compute a new strong hash.

jgrahamc · on June 8, 2012

Should have made clear that you can't do 5 if you need 4.

mjschultz · on June 8, 2012

Ah, okay. So step 5 is the else condition from the "if" that begins step 4. At least, until the API is upgraded to the new improved edition and most/all API apps are using the new version.

nateabele · on June 8, 2012

Okay, I must really be missing something here.

If your original database contains a bunch of unsalted SHA1 (or worse, MD5) hashes, what good does securing the hashes themselves do if the means to generate the corresponding plaintext has already been released into the wild?

Someone please tell me I'm missing something obvious.

joshrice · on June 8, 2012

It's how to fix your rubbish password storage, not LinkedIn's or the others who've been compromised. That's the difference.

notmyname · on June 8, 2012

On step 3, where you say "scrypt(s'i, md5(si, password))", don't you actually need "scrypt(s'i, md5(s0, password))", where s0 is the original salt? In other words, you still need to know the original salt you were using to successfully migrate.

Therefore, if you are storing the per-user salt as the first bytes in the hashed password field, then you have to be careful when you "throw away the old weak hash hi and forget it ever existed."

jgrahamc · on June 8, 2012

The original salt is s_i which you do need to keep around. The new salt is s^'_i.

notmyname · on June 8, 2012

Ah. my mistake. I completely missed the "'" as I was reading it.

yathern · on June 9, 2012

The article states that LinkedIn was using salted SHA-1 hashes, but I thought that wasn't the case. Either way, aren't salted hashes essentially uncrackable by all means except full out brute force?

If my password is "password", and I change it to "#b1@password%3dy", and then hash it, isn't it secure from basic dictionary/rainbow table attacks?

I'm a bit new to cryptography, so please forgive me if I'm not understanding some of this correctly.

IgorPartola · on June 9, 2012

Brute force is sometimes all you need. The problem is that using GPU's you can compute so many hashes a second that a short password simply cannot withstand such an attack for long. The salt helps a bit, but if someone is brute-forcing the hashes all it means is that once they have your password they don't have the other person's who happens to use the same one.

yathern · on June 9, 2012

I see, but isn't brute-forcing "aecd8c83718c381cpassworda3802..." going to take far, far longer? Even on some huge botnet clusters, I still don't imagine how it could be possible to crack that very quickly.

IgorPartola · on June 9, 2012

Oh, of course. But it will still take less time than you think. After trying a common dictionary the attacker just starts brute-forcing every single combination and since md5 is so quick and works so well on the GPU that it may take mere hours to find the answer. I've personally had what I considered a secure password cracked out of a sha1 + salt setup. Now I use LastPass and generate random different 32 character passwords for every service I use. LinkedIn leak does not affect me: 32 chars is enough to give me a day or two to change my password and none of my other accounts are compromised even if the attacker gets my LinkedIn password.

yathern · on June 9, 2012

Okay, thanks for the answer. I was under the impression that brute forcing takes a long time.

mistercow · on June 8, 2012

If I'm not mistaken, the first paragraph is wrong. All signs point to LinkedIn having used unsalted hashes.

aidos · on June 8, 2012

According to their statement, they started salting some time recently. I guess you got a salt when you next logged in?

danskil · on June 8, 2012

I was just doing a write up on swapping auth back ends to gain more security http://schneems.com/post/24678036532/zomg-my-passwords-are-i....

PaulHoule · on June 8, 2012

I used this strategy years ago (2002) to migrate plaintext passwords in a site with 50k+ users. In fact, I built this into the system so I could do arbitrary migrations between password encodings whenever I felt it was necessary.

It works well.

IgorPartola · on June 8, 2012

Ruby and Django could do well to have this type of strategy baked in. This way you update your configuration and the passwords are immediately upgraded.

gioele · on June 8, 2012

> 4. If, like last.fm, you were also allowing third-parties to authorize users...

... then you should stop doing that and you should start using OAuth, so the client application never sees your user's password.

alexmuller · on June 8, 2012

Theoretically, sure. But I can't think of a nice way to authorise users on something like [1]. They'd then need a computer with the radio to provide some kind of access code, I guess?

[1] http://www.robertsradio.co.uk/Products/Internet_radios/STREA...

ErrantX · on June 8, 2012

One time passwords (feed the radio your generated password & let it use that to negotiate authorisation/api keys).

dfc · on June 8, 2012

Why do you write/footnote like that? Is there a geographical disparity in how to footnote?

esbwhat · on June 10, 2012

I always wondered why people don't just use rainbow tables to get all the raw passwords, and then hash them with the better algorithm. The ones that are left, you just change upon login.

stavros · on June 8, 2012

Did you get that from here?: http://news.ycombinator.com/item?id=4078751

jgrahamc · on June 8, 2012

No, I asked a question yesterday about this (http://news.ycombinator.com/item?id=4080823) and spent a long time thinking about it. Great minds...

stavros · on June 8, 2012

Yeah, I guess it's not that amazing a coincidence... Most people would arrive at that.

jgrahamc · on June 8, 2012

A more fun instance of this sort of thing on HN is when I suggested that HN might be attackable because of a flaw in random number generation and then someone else who hadn't seen my suggestion went ahead and did it.

Me mentioning it: http://news.ycombinator.com/item?id=596126

The attack: http://news.ycombinator.com/item?id=639976

mixmax · on June 8, 2012

That was an absolutely amazing hack

peteretep · on June 8, 2012

Here's a worked example of a similar technique I wrote up ages ago:

https://gist.github.com/1051238

pdenya · on June 8, 2012

The variable names in this article are throwing me off. Is there a special significance to the subscripts and superscripts in the variables?

kbanman · on June 8, 2012

The subscript i denotes that the variable belongs to a single user i. The tick at the top is pronounced 'prime' and is used to differentiate between versions or iterations.

Ineffable · on June 8, 2012

Is that called "prime" by most people? I've always heard it just pronounced "dash", as in "s-dash" or "f-dash".

drivebyacct2 · on June 9, 2012

>Is that called "prime" by most people?

yes.