Hacker News new | past | comments | ask | show | jobs | submit login
Poul-Henning Kamp: LinkedIn Password Leak? Salt Their Hide (acm.org)
68 points by CowboyRobot on June 8, 2012 | hide | past | favorite | 57 comments



Would it be possible to come up with a simple little icon that can be put on sign-up pages to indicate that the service is using PBKDF2 or bcrypt of the like?

Then, it would need to become popular enough for users to start to recognise it and look out for it when signing up. Even if most users don't have any idea what it's about, plenty of the more technically inclined users would, and they tend to be the early adopters anyway...

The idea is to add a bit of pressure to services to store passwords correctly (similar to how users look for the green SSL bar when doing important stuff online), and providing some transparency to the users who care about this.


This is a bad idea, and here's why:

Honestly, I see it as almost self-evident that user would never ever learn this.

But more importantly, what would stop anyone from putting up these icons? Who would check that they actually implemented it?

Even if that was solved, people would just implement this one thing because it looked good. But there are plenty of other ways to ruin your password security, so you couldn't really trust them more than you could in the first place. (IMHO, this is a core issue with security standardization)


I think you could implement verification of the implementation by allowing a user to retrieve their own password hash. Most would have no idea what to do with it, but a few people who know what's what could use that to verify that they're using the algorithm.

This does not detract from the rest of what you said, of course, and I agree that this wouldn't really be useful.


Yes, most users wouldn't understand what this is. However, if some did, and they expected it to be there, that might be enough.

You're quite right that there'd be nothing stopping people from using this dishonestly, except their consciences and the fact they may have some explaining to do if a dump of MD5s of their passwords was released. That may or may not be enough.

In any case, I'm sure that this industry can do a bit better than it is at the moment. With big breaches of LinkedIn, Last.fm and eHarmony in the last 48 hours, surely something can be done.


The problem is that the people who don't use a KDF don't know any better. Aren't these the same sort of people who will implement the same logo as other sites use without understanding what it means?

> except their consciences

I would add ignorance to that list.


Like what? The other guy was nice, but badges are flatly stupid. Unless you want to force websites to allow security inspectors, you have to either assume they're doing the right thing or believe them if they tell you "They've got it".


It is all very well suggesting running the hash millions of times but sites with many users might not want such performance hit.

This kind of escalating competition based purely on computing power indicates to me that the very concept of passwords has probably had its day and we should seriously think of better alternatives.

Passwords are no fun to remember and to keep secure for the users either. Anyone with a reasonably active 'online life' suffers from this.

Maybe this is the real reason why Facebook is doing so well? Only one password to remember.


In order to reduce the computational overhead on the server, perhaps one option is to run (at least part of) the hash on the client (eg in Javascript). Does anyone have any idea of how that would perform and if it would be feasible?


Looking at downvotes on the parent post. I am bothered by the fact that participants of similar security discussion so easily downvote or lash out against any questions or ideas that diverge from their favorite mainstream. Seems like a common trend even outside of HN.

It's a perfectly valid post. No one says "use this, this is awesome and secure". If you think the idea is bad, then answer with an explanation. The -1 is simply not useful here. (Neither are one-liners that boil down to -1.)


In that case you just change the password. It's now whatever the client (pre)computes, the user input is just used to derive the 'real password'.


That's not entirely correct. The scheme that chris_j proposed can help prevent weak passwords from being cracked, since now an attacker needs to do one of two things to crack passwords: 1. Try lots of weak password - hash each one and compare to the list. This is slow, because the hash is slow. 2. Try breaking passwords with the partial hash - in this case the attacker either needs to try very difficult passwords (since these are passwords after a partial hash - what you called the 'real password'), or get the partial hashes from the users, which requires more effort.


Not quite; attackers aren't restricted to using a web interface, so they just send the partial hash directly to the remote site. As far as an MD5 brute force attack is concerned, "very difficult" is exactly as hard as a user-entered password. Keep in mind the purpose of hashing is to mitigate the damage caused by database leaks. Granted, an initial client side hash adds one more step the attacker must take to gain a password that's usable on other sites, but that doesn't protect the original site, unless implemented correctly will be crackable with effectively zero effort, and is unlikely to be implemented correctly.


I might totally be wrong, but for me the consequence of that approach are:

1. The attacker doesn't need the text the user entered anymore, just the precomputed hash

2. Probably the length and alphabet is fixed now, which might obfuscate/protect 'password' or 'test', but reduces the value of a strong password. Granted, this last part is a gut feeling.


Re the gut feeling – this is only a problem if the password has significantly more entropy than the hash. So, worst case MD5 (128 bits), this is potentially bad for people with printable ASCII (97 chars) longer than 19 characters.

But it's still not a real problem since 128 bits of entropy is unguessable in the lifetime of the universe (checking 2^64 hashes a second, which is obscenely many – perhaps every processor on the planet dedicated to the task would be enough – covers 5% of the search space in 34 billion years.)


Thanks for the answer. I can safely say that I'm far from an expert on the subject. If you'd be willing to educate me a tiny bit more though :

Is the first case (judging normal passwords) factoring in that a password varies in length? I mean, stupid thought again: You need to test all one character passwords, all two character password, a hash is fixed in its length?

And I wouldn't want to find the original input, I'd want to get in. For that my totally fallible gut says that I'd need to create a 'word list' of hexadecimal character permutations of length x. Is this really an impossible task?


Sorry, I missed this, but hopefully you will see it.

A hash is fixed, but at a long length. Now, because of geometric growth, the shorter lengths are basically irrelevant (since there are 10s times more 19 character passwords than 18 character ones, 100s or 1000s times more 19 than 17, and so on)

On the second point, yes, exactly, you need a word list of all hexadecimal strings of length x. Again, in the case of MD5 (128 bits), this is all the 32 character hexadecimal strings (since 32 characters * 4 bits per hexadecimal character is 128 bits). Such a list has a length of 2 to the power of 128 by definition - 340282366920938463463374607431768211456 items (about 10^38).

Making a list 10^38 items long is not impossible since that's well below the number of atoms in the earth (about 10^50). It is probably impractical however. Suppose you could store the numbers in iron (the most abundant element), you'd need to store each item of the list in about 0.01 nanograms.


One thing that I always wondered about this approach:

    for (i = 0; i < 1000; i++) 
        scrambled_password = HASH(scrambled_password)
Aren't we weakening the hash function? Presumably the hash function is not one-to-one, so if you iterate this for many iterations there is a danger that you could end up with a function that has a much higher probability of collisions?



... in a theoretical sense; it is comically unlikely to be a real-world concern.


Why would you assume that?

Persumably there should be no real reason why HASH(8_char_password) = 160_bit_hash should be less strong than HASH(160_bit_hash).

Not only that, but most hashing algorithms already do several iterations before returning the hash.


I think the concern is that the hash function may converge.


The space of the output of the hash as at most as big as the space of the input.


A full second? Facebook has 900,000,000 active users. They would need over 10,000 CPUs running for 24 hours just to log them in.


..amortized over several weeks. People don't log in every day.


If you share a computer, people log in several times a day. And I bet the distribution over time is lumpy. Even assuming that demand is completely flat, that's over 500 CPUs running 24/7 for three weeks just for the hashes to log people in.


Why can't I be allowed to choose the authentication method I use to access MY data (and be responsible for the consequences if mis-used). Is my data in linkedin really my data after all?


It wasn't the method of authentication that was the problem, it was that the stored credentials were inadequately protected against brute force attacks.


Who told you it was your data? It's a networking site, not EC2.


Is there a reason that one doesn't use a public-key encryption function with a unique, random public key per password to store the scrambled passwords? One would then store the public key and the encrypted password as md5crypt stores the salt and the hashed password.

This is of course not run-time configurable to increase the computational complexity of the password scrambling, but besides that, what are the problems? (I assume that there must be some, since I haven't ever heard of anybody handling passwords this way.)


Does this imply the client doing the encryption? I.e. the client creates a key pair and sends the public key to the server?

It sounds good but the challenge, as always, is the infrastructure. I think it would be great if I had a single personal private key from which I could issue chained keys for each domain where I have an account. But imagine managing this across desktops, browsers, phones, game systems, etc. ...


My intent was that the server should still be responsible for scrambling the password as usual. - My question is only about changing the algorithm server-side.


In last.fm's defense, they argued that there were some hardware devices (radios) that had last.fm clients, so they couldn't update their password system.

I have a question though, how would a strong password that takes around 1 second to hash affect the scalability of these systems; would it impact the login times of users a lot? Imagine thousands of people trying to login at once. Might it be the reason linkedin didn't hash and salt properly?


They still could've STORED them in a different way in the backend (e.g. use the md5 hash as 'password' and then use pbkdf2) then the leak would not have been as much of a problem as it was.


Dear tech journalists, please stop saying stuff like "But we have yet to find out why nobody objected to them protecting 150+ million user passwords with 1970s methods." We do know why people use SHA1(unsalted password), and it's because the dev stack still doesn't support something like SHA-256 or better yet bcrypt/PBKDF2 at all levels.

So, right, I was a web developer pushing my PHP-based company to have a more robust-against-db-compromise password hashing strategy. You know what the huge problem was? The huge problem was, MySQL (and hence phpMyAdmin) didn't have a SHA2() function until mid-2010. Not only is SHA2() 'not enough', i.e. it's too fast and you want to do key stretching -- but even then, they didn't even have that.

So suppose you are developing an agile product, someone loses access to their account and asks for a new password, you type `head -c 9 /dev/urandom | base64` into your shell and get back `pYG3fvp9c06m`. If you don't have anything better built yet, you're going to go into the database and write the one-off query `UPDATE users SET pw_hash=SHA1('pYG3fvp9c06m') WHERE username = 'bob.bobertson'`, or, at best, `SET salt='tyDvBBHioUNS', pw_hash=SHA1('tyDvBBHioUNSpYG3fvp9c06m')`.

If you could get an interoperable PBKDF2 working in MySQL/Postgres, PHP, et cetera, devs would use that. It's precisely because it's not easy that it's not adopted.

EDIT: My apologies to Poul-Henning Kamp for implying that he was a journalist. I thought that would be a sort of compliment but I can see now that it's more of a sort of category error. (But I still think that the problem is precisely that the whole dev stack doesn't support any standard.)


I just googled "PBKDF2 PHP" and the first page was full of free implementations. But maybe it's cheating, since I know what "PBKDF2" is. I tried to simulate what a totally ignorant person would do, and googled "PHP password." The second result was the PHP manual page on passwords, where it explains in eleven different languages, using simple words, exactly what the deal is with password hashing, and refers people to two built-in functions (crypt() and hash()) that handle both bcrypt and PBKDF2.

Exactly how much easier does it need to get? Shall we print out the manual page and put it under people's doorsteps?

It would take like a maximum of twenty minutes for anyone at all, armed with Google and Stack Overflow, to go from "I know nothing at all about password hashing" to "I am securely hashing my passwords" in PHP or any other language. I think it's fair to wonder what the fuck is wrong when, in companies full of tens or hundreds of presumed-competent programmers, nobody does that, ever.


LinkedIn was launched, in what, 2003? If you googled the general advice back then, it was pretty much just use MD5 or, if you were really cutting edge, SHA1. Salting wasn't common at all. Salting eventually started becoming common and now you're silly if you don't use bcrypt.


This is more a reflection on where you got your advise in 2003, than what was considered best-practice.

Salting became best-practice in the 1980ies, but the "lost generation" of dot-com wizards never bothered reading "all that old stuff", so they are doomed to repeat the mistakes.


I remember reading about salting back in the 90´s when I got my first copy of FreeBSD. It definitely is an old concept


I wasn't doing web development in 2003, so I can't really argue. But it's been 9 years during 2003, and there's been a tremendous amount of light and noise about the dangers of weak hashing strategies during that time. I'm sure that LinkedIn has a zillion programmers who follow programming blogs, read HN, and so on, so I can't understand why none of them have just sat down and fixed it. Even if it takes half a day once you add in documentation, QA, deployment, and so on, this seems like a completely obviously worthwhile half-day.


Dear tech journalists, please stop saying stuff like "But we have yet to find out why nobody objected to them protecting 150+ million user passwords with 1970s methods."

Poul-Henning Kamp (http://en.wikipedia.org/wiki/Poul-Henning_Kamp) is not a "tech journalist."


Poul-Henning Kamp is many things, but journalist?

He is allowed to say stuff like "But we have yet to find out why nobody objected to them protecting 150+ million user passwords with 1970s methods."

And this is Linkedin. They should know and do better.

I actually imagine that their very gifted developers are running around wondering how they themselves didn't audit this.


> I actually imagine that their very gifted developers are running around wondering how they themselves didn't audit this.

or perhaps its that some 3rd party can authenticate users using sha1 passwords i.e. that internally linkedin passwords are scrypted or something, but this dump was from MitM between 3rd party plugin and linkedin?


I can't imagine that the person responsible for the database can look his colleagues in the eye. He must have called in sick the day after the leak and is not coming back to the office.

You can only imagine how many times someone noticed that passwords weren't salted (by comparing stored passwords to a leaked set of hashes or raibow tables after another announcement from some company being hakced) and complained, and got brushed off.


I think you're setting the bar too high for tech journalists, lets aim for them knowing the difference between "md5" and "md5crypt" first.

But no, I don't think it is at all obvious why LinkedIn used unsalted SHA1.

LinkedIn went through an IPO, which implies that a number of companies have audited them from head to tail several times along the way.

If the commodity you buy is millions of user accounts, shouldn't you, as investor, at least check that there was a lock on the door to the warehouse ?


> So suppose you are developing an agile product, someone loses access to their account and asks for a new password, you type `head -c 9 /dev/urandom | base64`....`UPDATE users`

I don't think I ever want to be _that_ agile. My agile projects usually have a set of application functions exposed as scripts immediately. And yes, proper password change is one of them. (besides, how about just using `pwgen 16` and not some trickery with head and random?)

Second goal: establish a process that gets everyone flagged that tries to change things using phpMyAdmin that have proper equivalents in your scripting toolkit. Agility is no excuse for sloppiness. If the agile crowd still insists to be agile to death, call the whole thing MVT (Minimum viable toolkit).

Using a framework where all this can be done from a REPL also helps a lot.


... and since you are not danish, you don't realize that base64 emits the danish word "badeanstalts" and therefore fall to even the most trivial dictionary attack.


9 random bytes encoded to 12 base64 bytes is still 2^72 bits of random data. You'll be hit by a meteorite much sooner than you randomly generate "badeanstalts".


I can see why people don't use bcrypt/PBKDF2: they don't know or it's not a priority. Your reason, however, doesn't strike me as a particularly good one: you could just write a quick password reset tool in PHP or even better write a quick shell script that splits out the password reset query.

And I think LinkedIn really has no excuse.


I'm I guess a little hesitant to follow up on these because the original post is getting strongly downvoted, but I tend to agree somewhat. This is the same as I was telling people at the company -- "just use the PHP API we've developed!"

It turns out that this is a bit complicated, as my colleagues readily pointed out to me. So for example you are basically saying "write a .PHP file and execute it locally," which is perfectly fine, as long as the problem comes from your boss or one of your testers -- it is risky when it occurs on a production server (because the script you're generating is insecure). On your production server you really do want to execute the action from within a MySQL prompt if it's possible, and so it's a sort of two-sided game of "I'm going to reset my password over here and then update their (salt, password) with the result of my local PHP queries," and that's a bit weird as a process.

The other tender point is that once you've made a choice, it's very hard to change it. So, "all of our existing passwords use the old system, we're not changing!" was a very strong argument and I did have to spend a bunch of time creating a fall-back for legacy passwords.

I would agree, however with this: in general there is a reasonable expectation of, "if we're doing this so much that it bogs us down, then the app is mature enough for a proper email-sending password-reset tool; and as long as it doesn't bog us down we'll do it the hard way." But convincing people to make the hard way even harder is a tricky proposal even on a good day. It's like telling people, "no, leave that code, I know it does 2^n operations but n is always small and it's not actually the part that slows down our system and it's more readable this way." The intangible -- security/readability -- is being negotiated for the tangible -- dev-ease/speed. I had trouble selling it to the other devs.


Why don't you write small program in C and call it as external process from your PHP code to perform resource-intensive computations? Database should have nothing to do at all with user password hashes.


1. Wouldn't it be better if they used openid like stackoverflow?

2. Any advice on encrypting passwords? We store passwords for some 3rd party services for our users.


I had a long discussion with a my colleague Commander Adams today about improving password management policies for Krell Power Systems client logins.

This is a change that would have to be added to the current project backlog, specified and designed, developed, and implemented. Selling this means making a compelling case that salting and changing our hashes would actually solve a problem for us and our clients. My sense is that this is the case, but articulating the case in a unassailable way is still something that needs work.

The most compelling case would be for our clients to demand this as part of their security requirements for our systems. This sword cuts two ways, and a number of our existing password policies are clearly based on well-intended but somewhat misguided client-based requirements. The sane thing is to get good requirements.

Absent that, the question becomes: what is the threat, what is the risk model, what is the mitigation, and what benefits does that mitigation buy us and our clients.

The risk as I see it is disclosure of our user authentication hashes (thank Krell we're not storing cleartext passwords ... at least not there).

Leaking unsalted hashes means that both rainbow tables can be applied against the known hashes, and that duplicate hash instances (hence: duplicate passwords) can be determined and targeted for rainbow/brute force attacks.

Leaking non-bcrypt hashes means that brute-forcing is cheap. At some estimates, 3.3 billion keys per second on $1000 of hardware for MD5, roughly half that for SHA1 (http://www.extremetech.com/computing/84314-how-to-secure-you...).

A successful attack would gain user access, and might gain access to user information (of varying but largely low sensitivity) and be able to impersonate the user for communications purposes. Some collections of user data might be valuable for contact/communications/social-engineering purposes.

The biggest risk would be for users sharing keys among several services. As a fair number of our clients are corporate, and it's fairly well known that corporate password policies are often even more grossly weak than individuals', the likelihood of compromised passwords being used to access other user accounts in some instances is fairly high.

The question remains: how large are any of these risks?

What does salting and bcrypting buy in way of protection?

My read is that, on the technical side:

- salting hides common passwords within our userbase, and renders rainbow tables useless. Weak passwords are somewhat better protected.

- bcrypt makes the costs of brute-forcing passwords markedly more expensive. Very, very weak passwords could still be cracked, but we're talking on the order of searching through perhaps a few millions of keys -- 4-character alphanumeric mixed-case passwords would be at risk.

- checking proposed (or entered) passwords against a known set of common passwords -- even just a few tens of thousands of the most common ones -- would further reduce low-hanging fruit. Ideally I'd like to see a publicly available corpus of all known passwords, to be used to exclude duplicates.

But again, the question becomes, what demonstrable benefits does this present to us and our clients? How do I make the case?


You can integrate common passwords into your password choice UI.

These guys did a nice example: http://howsecureismypassword.net/

Lists are available from various sources. Here's a good page http://www.skullsecurity.org/wiki/index.php/Passwords


"What does salting and bcrypting buy in way of protection?"

Information leaks are common: a backup tape gets FedExed to the wrong address, file sharing gets accidentally turned on, a Russian hacker finds a security hole in your machine while scanning millions of machines, some idiot puts the password database on a laptop and loses it. These sorts of problems are constantly making the headlines.

If you have bcrypt-style password encryption, such leaks are a nuisance and embarrassment.

If you do not have password encryption, the leak recipient can easily impersonate any and all users. They can control your system, create false communication, cause industrial equipment to destroy itself, send harassing messages, conduct financial fraud, and so forth.

The cost to use password encryption is a little engineering labor, the return on investment is a substantial reduction in risk.


I've got no question with your first point -- leaks happen. Elements of our hosting environment, regardless of that environment, mean we have lapses in control, whether it's on-site office cabinets, hosted colo, or cloud provider.

Our backups management is pretty solid, with backups encrypted, and even DB systems using on-disk at-rest encryption via an ecryptfs tool.

You did raise the valid point of sensitivity of identity data among some of our clients. While the general case is that PII (personally identifying information) disclosures would largely be embarrassing but not harmful, there are cases in which harm, or even life-threatening risks could arise.

I'm leaning to your conclusion but I'm looking to be able to quantify that more robustly.

And as I noted in my original question: if we were getting pressure from our clients on this, the case would be far easier to make. Market rules.


The trouble with defense in depth is that you have to admit your existing defenses may be inadequate. I can see how that could be politically difficult in a large organization.


Even for a less-than-large organization, there are issues.

One is the perceived fear of looking incompetent in front of your users/clients. For which I feel the appropriate response is "we'll look a lot more competent if we mitigate the risks of such an event than if we don't, regardless of whether or not it happens".

But really, the big one is simply: can you justify the engineering/product cost of this change on the basis of a material business benefit to us and our clients?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: