Hacker News new | past | comments | ask | show | jobs | submit login

To play devil's advocate here, isn't this actually more secure than something like MD5 or SHA1 without stretch factors or multiple invocations, assuming that the key was not also stolen?

My reasoning is, that in order for an attacker to get the passwords out of this dump, they have to break the 3DES encryption. Brute forcing the key is, as I understand it, still very difficult, and without it they can't get any of the passwords. If someone did find the key however, they'd have instant access to all of the passwords no matter how complex.

On the other hand, if the passwords had been protected using an unsuitable hash algorithm, the highly efficient GPU-based crackers would be able to find millions of people's passwords very quickly, using the sophisticated dictionaries and mangling techniques that are around now. Even quite complex passwords can often be found in this way, since the GPU crackers have got so fast they can try billions of combinations - e.g. even things like "!)@(#*$&%^Test123" can be cracked. [1] [2]. Although, extremely long and complex passwords should be safe.

Obviously, I'm not advocating we all switch to 3DES for our password storage, and the huge risk here is that the key was also stolen - but I'm wondering if my reasoning is actually right here, and that people without extremely strong passwords are better off with this leak than if it'd been MD5.

[1] - http://arstechnica.com/security/2013/05/how-crackers-make-mi...

[2] - http://arstechnica.com/security/2013/10/how-the-bible-and-yo...




You're right --- had they used something other than ECB mode. For example, if they used CBC mode with a proper IV, assuming the key is not stolen or compromised, the passwords would be quite secure.

The issue becomes verifying the passwords, then: supposing you have a CBC mode oracle, like a HSM, how can you verify two passwords are the same? (This is probably the reason they chose ECB mode in the first place.) In fact, if you allow the user to test if two ciphertexts represent the same plaintext --- and nothing else --- you still break the very definition of secure encryption, namely that a secure encryption scheme should have, as one popular definition, indistinguishable ciphertexts (usually under a chosen-plaintext attack).

So you have to develop some new way to measure security for the scheme, or perhaps somehow measure the damage that an equality oracle can inflict upon an IND-CPA secure scheme. The notion of indistinguishable ciphertexts roughly reflects the inability of an attacker to reliably learn any function of the plaintext from the ciphertext. Throwing that idea out the window seems unwise, since it's such an elegant idea. So, anyway, you're off in uncharted waters, not a good place to be if you're securing users' passwords.

All of that is relatively complex, though, so just let me know if I need to elaborate on some idea more. (I am never sure how in-depth to go in these posts.)


For anyone wondering about the acronyms above, ECB is Electronic Codebook, CBC is Cipher-block Chaining, and IV is an Initialization Vector used in CBC. IND-CPA is Indistinguishability under chosen-plaintext, a method of attack.

These are explained at [1] and [2]

[1] http://en.wikipedia.org/wiki/Block_cipher_mode_of_operation

[2] http://en.wikipedia.org/wiki/Ciphertext_indistinguishability


> The issue becomes verifying the passwords, then: supposing you have a CBC mode oracle, like a HSM, how can you verify two passwords are the same? (This is probably the reason they chose ECB mode in the first place.)

Why is that? What's wrong with verifying a password in CBC mode with different IV?


I am assuming that the DB server cannot decrypt. If the DB server has decryption permissions, and it is compromised, then there's little need in encrypting the passwords anyway as an attacker will simply decrypt them by hand. (They may not get all of them before being detected, but they'd still get some or most --- and they might look for high-value targets!)

Unlike ECB mode, you can't just encrypt the same password again and check to see if the ciphertexts are the same --- the IVs will be different (hopefully!). So, can you somehow check two ciphertexts with different IVs to see if they represent the same plaintext? Nope, probably not: if you could, then the scheme would no longer have indistinguishable encryptions under a chosen-plaintext attack [1]. Briefly, this ability to test if two ciphertexts represent the same plaintext would allow the adversary in the IND-CPA game to simply encrypt both of the plaintexts and test them against the challenge ciphertext. (See the [1] reference for more info on this.)

In fact, this is exactly the property you don't want. You don't want to be able to determine if two users have the same password. If the DB server were able to test ciphertext equality, then it could pairwise-test each pair of users (or again only hunt for high-value targets).

If you were really adamant on encrypting passwords, I'd also suggest that we'd need to pad the password out to the max password length to prevent the revelation of password length. But of course, I very strongly suggest against password encryption. Still: that's another thing to consider in this hypothetical scenario.

[1] http://en.wikipedia.org/wiki/Ciphertext_indistinguishability


> I am assuming that the DB server cannot decrypt. If the DB server has decryption permissions, and it is compromised, then there's little need in encrypting the passwords anyway as an attacker will simply decrypt them by hand.

I am not an expert in security/cryptography, but to decrypt you need a secret key, don't you? In this case Adobe believes the cracker doesn't have the secret key which means if Adobe engineers did use CBC and random IV for each password, the cracker can't learn much even provided with hints in the database.

So if you store the IV and the ciphertext in DB, and when you want to authenticate, you encrypt the plaintext with that IV with your secret key. I don't see why that can't be done. Encryption is really only as strong as your secret key security and the IV in this case looks like a salt in hashing scheme. The difference encryption is bi-directional and hashing is one-way.

So I don't understand why they have to use EBC mode. I am not getting your point.


Why go through all that when you could use the username or email together with the password to encrypt it with 3DES ECB mode? Those would be unique and users who have same passwords would still have different ciphertexts.


Sorry for my late response.

3DES has a block size of 64 bits, or 8 bytes. Unlike a hash function where the whole input affects the whole output (the strong avalanche criterion and whatnot), in ECB mode encryption, data is only changed on 8-byte block boundaries.

So, for example, suppose the user's username was 8 characters, their email was 16 characters, and their password was some more characters. Then if you use

  3DES-ECB(uname || email || passwd)
where || denotes concatenation, then uname and email will take up 3 blocks and the password will be the rest... this is essentially the same scenario as the Adobe leak, except that the attacker now has to worry about how the password falls across the block boundaries. In this contrived scenario, the password starts a new block of its own, so the addition of the uname/email adds no security here.

Since the uname/email lengths are public, you still might be able to cross-reference sections of identical passwords with other users, depending on how the blocks line up. In any case, this scheme still doesn't offer the unconditional security level you'd like.

I'd recommend the Matasano crypto challenge [1]. It covers some material similar to this pretty early on, so you get your feet wet here.

[1] http://www.matasano.com/articles/crypto-challenges/


Wow thanks that was interesting, hadnt thought of that.


AFAIK you need the same IV for both encryption and decryption.

Some calculate an IV using existing components (such as has of email or name or such), some always use 0x0 as IV. But safest method is to have a random IV (preferably also stored in a HSM along with the key) per encrypted account.


" … assuming that the key was not also stolen?"

Is that a sensible/defensible assumption though?

from: http://www.csoonline.com/article/742228/stolen-adobe-account...

"In an update on the data breach disclosed earlier this month, Adobe has said that source code for Photoshop was stolen."

I might be being overly paranoid, but I've shut down Adobe's Air/Acrobat/Flash updaters at the firewall until I hear plausible sounding assurances that Adobe didn't lose _everything_ in this breach, including software signing keys, update servers, DNS SOAs – the whole lot. _Maybe_ some of that stuff was better secured than the Photoshop source code… But would you bet every machine on your network that they "only" lost ~130million account credentials and the Photoshop source code, but nothing else?


Oh you mean like having the code signing service being compromised and used to signed malware?

http://www.zdnet.com/adobe-code-signing-infrastructure-hacke...


Some people are claiming credit cards used to buy CS6 and whatnot are compromised too.


Roughly, yes. You can't just run through the database and crack every account that used one of the top 1000 passwords (password, secret, sex, ...). But since you can see all the accounts that have the same password, that lets you:

1. identify all the people who used a popular password

2. identify anyone who happened to use the same password as one you already know, such as your own.

1 + 2 = 3. if you mount an online attack against the people from 1), once one account is cracked you instantly know all the others as well, and can probably dodge any security of the three fails and you're in timeout variety.

The security afforded by the 3DES key (assuming it's a secret) should be greater than the security of a password like "adob3" even with a good hashing technique.


And when someone else happened to use the same password as you, and enter an obvious "password hint" - too bad for the others who tried to be cautious with their password hints...


There's an essential note which is not quite clear from the article too: while encryption algorithms are reversible, most hashes are not -- sometimes we forget, but they're soft of by definition many-to-one (injective) functions -- while encryption is fully reversible (albeit mathematically "hard"). Of course, this won't matter for websites that use the same hashing algorithm, but it's an essential part of why it doesn't make sense to encrypt rather than hash passwords apart from key compromising.


Yeah, secure hashes are (hopefully) not mathematically reversible - but when you can try 350 billion combinations per second on a GPU cracking setup, that stops being such a great defense. This is why you have to use a slow hashing function.


But how do you keep the key from getting stolen, when you couldn't keep the password database from being stolen?


So do multiple things to make it harder for the naughty people, i.e. hash and then encrypt the result.

Hashing or Encrypting?

1) If you have to choose one or the other then obviously hashing (with a random salt and a relatively expensive algorithm, e.g. bcrypt et al with suitable work factor) is generally the way to go.

But it is still possible to brute force many of the easy passwords from a DB leak of bcrypt() hashes, it just takes a bit of time. From there you get to know that email address X uses password Y, which may open the door for hacking into other accounts where they've used the same passwords.

The people that get fucked over first from a DB leak of hashed passwords are those with weak passwords. A poor hashing algorithm (unsalted MD5, etc) may even expose seemingly "unguessable/random" passwords thanks to rainbow tables.

Even with bcrypt() hashed passwords you should be able to work through a huge portion of the top 100 passwords for all 130M accounts and come out with a huge number of email/password pairs.

Increasing the work factor of the hashing algorithm is a trade off, too high and you'll soon need extra servers just to cope with the extra CPU load of people logging in and having to check their passwords, too low and the hashes are easier to crack.

2) But, why not use bcrypt() to hash the passwords, and then some encryption algorithm to encrypt the resulting hashes?

Before anyone jumps in with it, doing this is not "security by obscurity" because you're not relying on the encryption key remaining secret alone to protect the password.

What it does protect you from is a basic leak/dump of the DB being open access for those who want to try cracking the bcrypt() hashes.

They're left with an initial problem of finding the encryption key before they can even start on the bcrypt() dictionary attack.

Sure it just takes someone to grab a copy of the login code (or wherever the encryption key is being stored) but you've protected yourself from a basic SQL injection attack that could be used to just dump the DB without access to the server to compromise the login code.

Also, hints:-

Why weren't the hints encrypted? (Including having a random n character 'salt' that is prefixed to them before encryption to prevent the same hints encrypting to the same string).

If the hints were encrypted you couldn't use them to help guess passwords.

Even if the passwords were hashed rather than encrypted, the unencrypted treasure trove of password hints would make the job of cracking the passwords much much easier.

Why were the hints even stored in the same table (or even on the same server) as the passwords? (Maybe they weren't and the hackers got both and combined the two datastores.)

Again, if the server was compromised enough that the source for the login code was obtained then the hints would effectively be in plaintext, but you've still protected the hint data from a simple DB dump hack.


I'm sorry, how can you check easy passwords with bcrypt? Hell, checking just one password per account, assuming bcrypt takes around half a second (what most reasonable implementations take) would take two years. If you want to check the 100 most common passwords for everyone, that's 200 years right there.


Half a second sounds pretty slow for authentication servers handling over 100 million accounts. Here's how I figure it: Assume about 10ms per attempt, since this is a high-throughput login system. Use a whopping 10 computers to do the cracking, and you're already under a month to try most of 100 passwords on every single account.

When using an awful password, bcrypt can only do so much. It can protect you from the ideal case of a single person with a single core that doesn't filter accounts in any way. Now consider how many people have access to this database...


Sure, someone on a single machine isn't going to get far, but that's not a great defence to rely upon.

The answer is one of two things:-

Botnets, which most hacking groups will have some access to. Being in control of a modest 10,000 machine botnet reduces that 200 years to about a week, call it 5 weeks if you limit utilisation to 20% of a single core. Expect more of this when bitcoin mining on botnets becomes less profitable.

Also, despite bcrypt() being designed not to be easy/fast to implement on GPUs because of the memory footprint required, GPUs are growing in size and reasonable implementations exist for FPGAs. ASICs would be even faster.

Upping the work factor to compensate for this makes more work for the CPUs at the site that is using bcrypt(). I know of one company that has more cores utilised in performing bcrypt() checks than they do running the HTTP and DB portions of the site.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: