I cannot see any obvious weaknesses in this scheme.
It seems to address a known pain point in bcrypt (max length), implements a pepper in a secure way (which cannot inadvertently degrade security), and is otherwise doing things which are best practices (high work factor, per user salt, etc).
I know peppers remain controversial (some people claim they're pointless, and make a good argument). But ultimately nothing Dropbox is doing with peppers in this article makes your password easier to break, only harder.
Proof that they're not pointless: The adobe password leak. Other than the giant crossword puzzle[0] created by the password hints combined with their choice of ECB mode to encrypt the passwords that allowed people to infer blocks of passwords, I haven't been able to find any evidence that the encryption key was leaked or guessed. So, most of the passwords were never discovered. I'm betting their key was a full 168 bit random value that was immediately deleted when the leak came to light, so it's likely that value will never exist again in this universe. Compare that to something like LinkedIn (SHA1) where enthusiasts have cracked almost 97% of the passwords in that leak. How many more have blackhats cracked?
I certainly wouldn't rely on symmetric encryption alone to store passwords. If the password leaks, you expose all passwords in mere seconds. Plus you can see your user's plaintext passwords (since you have the key), which you should not be able to do. But as an extra measure symmetric encryption has already proven itself to be useful.
It's a good system, especially compared with the current best practice of simply hashing passwords with bcrypt and calling it a day.
I can't recall it off the top of my head, but Facebook has a similarly impressive system with more secret sauce involved for performance at scale. I believe what they do is the following:
1. Hash the password with MD5(password).
2. Generate a 20-byte (160-bit) random salt (this is well over the 64 bits you'd need to defend against birthday attack collisions).
3. Hash with hmac_sha1(hash, salt).
4. Send this value to a separate server for further operations (mitigates offline brute-forcing).
5. Hash in a secret key with hmac_256(hash, secret). Note this operation is on a separate server. The secret key might be colloquially termed a "pepper".
6. Hash with scrypt(hash, salt) to make local computation slower.
7. Shrink the final value with hmac_256(hash, salt) for efficient database storage.
If any Facebook engineers are around, please correct me if I've missed or misinterpreted any part of that.
The current best practice of simply hashing passwords with bcrypt is fine, and anything past that which doesn't doesn't involve (probably per-password) use of an HSM adds only marginal value.
I wouldn't want anyone to read this and get the impression that "secret peppers" and multiple hashing rounds and HMAC were important components of a password storage system. They are not: they're things that message board nerds come up with.
If you want to be a step better than just storing passwords with bcrypt, your next step is to create an authentication service that runs on separate hardware with an "is this password valid" and "enroll this password" API and nothing else. The stuff people do instead of this is basically cosmetic.
>>The current best practice of simply hashing passwords with bcrypt is fine, and anything past that which doesn't doesn't involve (probably per-password) use of an HSM adds only marginal value.
Yep, I mostly agree. As I said elsewhere in the thread, I wouldn't really recommend this sequence to any company unless it were very large and had a mature ops team to handle it. There's a lot of diminishing returns here.
Yep, Tom, that's the FB solution - or rather, halfway up the stack of hashing is a callout to a service where HMACs are involved. These bring their own challenges: https://video.adm.ntnu.no/pres/54b660049af94
No, according to your talk, Facebook's solution is that password validation is pushed out to the front-end servers, who use back-end KMS services to do some (but not all) of the crypto.
I think this is a suboptimal approach. Can you tell me what the benefit of your layered approach is over simply adding to your KMS servers the APIs for validate(user, password) and change(user, oldpassword, newpassword)?
If your KMS service did password validation directly, you wouldn't need any of the layers in this architecture. HMAC would add nothing (it would be tautological, since anyone who could directly attack the hashes must have also owned up the KMS service). I still don't totally understand the MD5 step. You could just use scrypt and nothing else, and you could probably ratchet the work factor up because you wouldn't be billing the front end servers for those cycles.
I generally like, have recommended, and have built a few times the "software HSM" KMS approach you're describing here --- but only for "seal/unseal" and "sign/verify" APIs.
>No, according to your talk, Facebook's solution is that password validation is pushed out to the front-end servers,
Yes, although I don't work for Facebook any more, that was and probably still is the case.
>who use back-end KMS services to do some (but not all) of the crypto.
In fact, the backend service does a tiny amount of the crypto.
>I think this is a suboptimal approach.
Of course you do, it's not like I've not argued with you before, Thomas. :-)
>Can you tell me what the benefit of your layered approach
...Facebook's layered approach...
>is over simply adding to your KMS servers the APIs for validate(user, password) and change(user, oldpassword, newpassword)?
If you want your hash to be ... hashed, rather than lodged in some questionable silicon, not putting a fat crypto load onto the backend avoids the "thundering herd" problem when some fraction of 1.7 billion people want to log in.
>If your KMS service did password validation directly, you wouldn't need any of the layers in this architecture.
Quite, and if everyone flew instead of drove, we wouldn't need cars with layers of bumpers, crumplezones, seatbelts and airbags; but it's merely shifting the problem for 1.7 billion people.
Have I mentioned "scale"? I should mention scale. Scale is a thing.
>HMAC would add nothing (it would be tautological, since anyone who could directly attack the hashes must have also owned up the KMS service).
Yes. Fucking huge KMS service. HUGE. Trumpiness levels of -HUGE- and eating lots of power and redundancy.
1.7 billion people. That's a lot. Like 0.01% of it is 170,000 people. All logging in together. All over the world.
>I still don't totally understand the MD5 step.
Yeah, but I'm not betting that Mark's/whomever's coding at the time was focused on the future of password authentication.
>You could just use scrypt and nothing else
Yes, but where's the fun in that?
>and you could probably ratchet the work factor up because you wouldn't be billing the front end servers for those cycles.
The frontend servers are approximately precisely where you want the cost/chokepoint. There's a metric fucktonne of them and they are closest to the request, so by definition they are scaled to the load.
>I generally like, have recommended, and have built a few times the "software HSM" KMS approach you're describing here --- but only for "seal/unseal" and "sign/verify" APIs.
Last question first: large, but not Facebook large (I spent 10 years consulting on this kind of thing).†
You comment early in the talk that the MD5 step is somehow helpful for password dumps. That was the bit I didn't follow. If it's there because that's how password hashing worked before your team got to it, that makes a lot more sense. But then: it's not a "layer" of the onion so much as a sheen of dirt that needs to be washed off the onion. :)
I get that your auth problem is huge. Yuge. So big you wouldn't believe it. I totally believe you. No, wait, I don't believe you, that's how big I know your authN problem to be: unbelievably huge.
But here's the thing: you're already scrypting passwords. We're not debating whether you can use expensive password hashes. You already use expensive password hashes. I'm saying: the model where the KMS does a small bit of the password hash step and defers the heavy lifting to front-end servers seems like a suboptimal way to structure this:
* You have to bill cycles from the front end to do it
* You can't change password hashing without updating all the front-end servers
* It's harder to track usage because it's spread across a zillion machines
* You're more constrained in how you scale it (for instance, if you wanted to double or triple the work factor) because whatever your new scheme is, it has to fit with the existing front-end resources.
I'm not saying "wow, it's dumb that you built it this way". I'm saying, if other people are reading this thread thinking about how to do it:
* DO split authentication out into its own service
* DON'T have that authentication service be "HMAC as a service" and then do scrypt on your front-end service
YOUR MOVE, ALEC MUFFETT. I keep going until you unfriend me on Facebook so I can't see you wincing about these posts.
† I've assessed Facebook-large variants of this, though.
>Last question first: large, but not Facebook large (I spent 10 years consulting on this kind of thing).
I just spent 3 years living it for 50h/week. Hence why I am taking a vacation.
>If it's there because that's how password hashing worked before your team got to it, that makes a lot more sense.
That. It wasn't even me, it was done before I arrived, but it was done by a team of geeks with a tremendous nose for making the best of the database that they had available to them without pulling the old password-migration "log in with one password, parallel-encrypt with a new algorithm, and save the new hashes" - thing, because some of those billion people might never log in again for years. You would never stop migrating people.
I remember internal pasword algorithm migrations at Sun, at least there you could force the matter for 10,000..40,000 people.
But you can't force everyone to migrate at FB scale.
>But then: it's not a "layer" of the onion so much as a sheen of dirt that needs to be washed off the onion. :)
You can take that approach, but - again - when will you finish the task? Whereas wrapping one algorithm in the next is a finite task which is completable in a reasonable amount of time.
>I get that your
...Facebook's...
>auth problem is huge. Yuge. So big you wouldn't believe it. I totally believe you. No, wait, I don't believe you, that's how big I know your authN problem to be: unbelievably huge.
Well channeled. :-)
>But here's the thing: you're already scrypting passwords. We're not debating whether you can use expensive password hashes. You already use expensive password hashes. I'm saying: the model where the KMS does a small bit of the password hash step and defers the heavy lifting to front-end servers seems like a suboptimal way to structure this:
>* You have to bill cycles from the front end to do it
Yes. 0.1% of frontend cycles. <blank expression> And?
>* You can't change password hashing without updating all the front-end servers
...which happens three times a day, weekdays, and is moving to moreso.
>* It's harder to track usage because it's spread across a zillion machines
>* You're more constrained in how you scale it (for instance, if you wanted to double or triple the work factor) because whatever your new scheme is, it has to fit with the existing front-end resources.
Yes. For a site with wildly heterogeneous architectures in front-end deployments, I can see how that might be a concern; but even AWS leads people to standardise on having approximately-the-same-kinds-of-hardware-doing-approximately-the-same-things.
>I'm not saying "wow, it's dumb that you
...Facebook...
>built it this way". I'm saying, if other people are reading this thread thinking about how to do it:
>* DO split authentication out into its own service
...or some component of it...
>* DON'T have that authentication service be "HMAC as a service" and then do scrypt on your front-end service
Why not?
>YOUR MOVE, ALEC MUFFETT. I keep going until you unfriend me on Facebook so I can't see you wincing about these posts.
Wince?
> I've assessed Facebook-large variants of this, though.
Sorry for the delay. A Jazzercise class suddenly appeared in the coworking space I work out of, so I fled, and then I had to give a talk about Starfighter.
Responses: (when I say "your" let's just stipulate I mean Facebook)
* Your password validation overhead is .1% of current front end resources, but could be ratcheted up, and would be easier to ratchet up if they weren't shared by other things.
* I totally understand why you keep the old MD5 cruft around --- but would add that it's cruft that would be even less obvious if it lived behind an authentication server.
* I think it's safer, simpler, cleaner, probably easier to scale, and definitely easier to change authentication if it lives in its own service rather than being implemented (in part) on a generic application server. As usage shifts from HTML front-end to all API, you might even be able to keep app servers from even seeing passwords.
* By "assessed", I mean, worked on other people's systems at this scale.
So I guess I'd wrap up with a question: if you had this to do over again, from scratch, the way you wanted to, would you have app servers do a password hash and then entangle it somehow with an HMAC operation from a crypto service, or would you have the whole password hash done on the crypto service directly?
Putting the authentication service into a nice tidy centralised box does not actually achieve much, and may have architectural downsides.
Not the least of which is: if it's wholly in a service, then you have to authenticate the service; that's not such a big step from "if the hashed passwords are stored in a directory, then you have to authenticate the directory" of course - but if we were to equate the two systems because of the need to authenticate the {directory, service} then the service-based solution still has the downside of being a CPU hotspot and a potential single point of failure.
We're much better at distributing directories of data which is self-protected / needs no special treatment, than we are at building humongous scalable "secure" services with an enormous TCB and a physically enormous attack surface / footprint.
Yes. If I was doing this, sure, I would do this again. Curiously I am a big fan of password hashing rather than all-singing, all-dancing authentication services.
> you also have to authenticate the HMAC-providing service
Yes.
But, to look at the Facebook approach, what is the risk surface presented by the HMAC service?
Done properly in the FB approach the password is irreversibly hashed before it arrives at the HMAC component, and cheaply HMAC'ed and returned, where the onion of hashing is completed.
It's good to bidirectionally authenticate access to the HMAC service, but in terms of protocol it strikes me as less critical than in your scenario.
Either the HMAC is done properly (in which case the eventual hashes will verify for legitimate users) or - if someone inserts a "fake" hashing service - the HMAC'ed results will not validate, and a bunch of legitimate users will experience login failure.
( edit: there's a risk of exfiltrating the input to the service, but it's meant to be a shitload of work to achieve any evil with that input anyway, which also can be shorn of user-metadata and other clues thereby making it a bit less valuable )
Maybe I have missed something but to my mind this threat scenario fails (by dint of fake services, exfiltration, etc) in a "safe" manner.
=== Now === consider your "authentication service" approach.
Plaintext goes into... what?
The real service?
A fake service that returns "true" in all circumstances?
A MITM that exfiltrates the plaintext?
Where do you put the root of the trust chain to this service? In an SSL Certificate? Pinned? From which CA?
Simply: I feel that in centralised password authentication services there are a lot more potential shenanigans to defend against.
>> It's a chain of hash because that's how passwords were migrated from being stored unsalted, to salted, to scrypted.
I kind of figured that, thanks for confirming :).
Personally I wouldn't consider Argon2 yet for production, but only because I'd like to see it run at scale for a few years. PHC or not, I'm hoping it becomes more battle-tested in production use.
That said, I'd fully respect any team for using Argon2 and have no personal qualms with it.
If this jury-rigged, duct-taped "password hashing" scheme impresses you, I've got some land in Florida you might be interested in.
Seriously, though, it's a complete fallacy to think that more complicated password hashing schemes with lots of fancy steps are better. I was at the talk where this scheme was presented, and Alec Muffett himself said the only reason it was so complicated is because they had to layer stronger hashes on top of existing ones instead of revoking all outstanding session cookies (forcing every Facebook user in the world to re-authenticate).
I'm aware of all that, and I'm not impressed by the numerous hashing steps. I'm impressed by Facebook's commitment to migrating and future-proofing their security practices as they become obsolete.
Specifically, this means "wrapping" insecure hashes in more secure hashes and the addition of an encryption key stored in an HSM on a separate server.
Thank you! This is exactly what I was referring to. There's a slideshare of this talk too. I couldn't find when I first wrote that comment, but I bet I'll find it with "onion" as a keyword.
As mentioned by someone else in the thread [0], Dropbox is using pepper with encryption. Many (stupid) people will use it with hashing instead, which means they cannot rotate the pepper without resetting every user's password.
Moreover, using pepper will make some (stupider) people do stuff like
hash( salt + hash( pepper + password ) )
which is very likely to increase the attack surface.
More broadly, since most secure hashing functions were not designed to be used pepper, it forces people to try to come up with their own ways of making it work. They should not: DO NOT roll your own crypto [1].
What's the problem with `H(salt||H(pepper||password))`? If `H` is a good PRF, it should work fine (you want `KDF(salt, H(pepper||password))` to protect against guessing attacks).
In any case, if you pepper is exposed and you don't have a good pepper rotation scheme, you are no worse than if you started with no pepper at all.
If the used hash function is based on the Merkle-Damgård construction, then it is vulnerable to a length extension attack [1]. That means if an attacker knows a hash of the form H(pepper || salt || password), then the attacker can generate more hashes H(pepper || salt || password || password_extension) by only knowing the combined length of pepper, salt and password but without knowing any of the values. That does not help breaking the password but you still have an attack point in your system. Maybe someone makes a really bad decision and decides to reuse the password hash for something it really should not be used. Also note that with this construction you can trivially find collisions for different pepper, salt and password, they just have to yield the same string when concatenated.
Okay, but the suggestion was not to just concatenate everything but involved hashing parts before further concatenating and hashing them again. The length extension attack will still apply, you could for example use it on H(pepper || password) in H(salt || H(pepper || password)). Is this useful? Where would you get the length of and a hash for pepper || password from? I don't know but why would you risk that someone figures out how to do and abuse it if there are alternatives? And last but not least constructions like H1(H2(message)) or H1(message) || H2(message) may look innocent but they are not and they may weaken your system. See for example this Cryptography Stack Exchange question [2] or the answer by ircmaxell on this Stack Overflow question [3].
So as a couple comments ask, what if you have to rotate the pepper? There's ways to do that, but sometimes people disengage their brains, and by making the whole scheme a little more complicated and a little more difficult to work with, you entice people to cheat. Like storing the plaintext password in a second column for "rehashing" purposes.
Glad to know that my self-designed system pretty much matches this "10/10" scheme :)
the only difference I do, is I perform the Sha512 hash client-side, so that the user's plain-text password isn't sent to my servers. Any thoughts on that?
password rules are enforced client-side too. You are right that a determined person could circumvent the client-side validation, but my thinking is that if someone really wants to do this, it doesnt really hurt me any.
This is the unsafe mentality of "roll your own" and it's always a bad idea. There's a reason why we have to practically blindly follow the best practices; the attack vectors are so diverse we cannot predict them.
How would you handle 3rd party phishing schemes attempting to register users with weak passwords for example?
The "obvious weakness" is the non-technical part of this: you can sign up to Dropbox (and most services for that matter) with an extremely weak password. I just signed up with a dummy email address and a password of "password". :-) If you look through password lists that have leaked online, the most common passwords are very easily guessable.
Anyway, not trying to dismiss their efforts here -- they're good. But this is only half of the equation.
I go back and forth on this, but ultimately, anyone using "password" or "123456" as their password should expect their account to get broken into at some point. Honestly, I've never met anyone that uses passwords that weak for anything they actually care about, even completely non-technical folks.
A note about combining SHA512 with bcrypt: Don't feed the raw binary output of SHA512 into bcrypt. Use the hexadecimal or base64-encoded form instead. (Dropbox probably does this already, since they mention base64 in passing.)
bcrypt is known to choke on null bytes. Each SHA512 hash has a 25% chance of containing a null byte if you use the raw binary format.
Using hex or base64, of course, decreases the amount of entropy that you can fit into bcrypt's 72-byte limit. But you can still fit 288 to 432 bits of entropy in that space, which is more than enough for the foreseeable future.
Much faster than all of those multiplications that invertibly maps 64 bytes to 65 bytes without nulls: replace first null with 255. Every later null, replace with the index of the previous null. Make the final byte the index of the last null (or 255 if no nulls were replaced). In this way, you've replaced the nulls with a linked list of the locations where nulls used to be. To invert the transformation, just start at the final byte and walk the linked list backward until you hit a 255. (You'd never do the inversion in practice, but the existence of the inversion algorithm proves that no entropy was discarded.)
Use the time saved by not doing base 255 conversion to increase your iteration count.
Any time you spend or save on these kinds of transformations is going to pale in comparison to the bcrypt step. It might not even amount to a single iteration.
As someone who exclusively uses a password manager with random unique passwords for each service it always amuses me to see posts like this.
Years ago I relieved myself from the stress by using a password manager. Now for all I care they could be storing it in plaintext and it wouldn't make a damn difference to me. Problem solved.
It would still make a significant difference since someone could still compromise your Dropbox account... Having a password manager doesn't all of the sudden make all of your passwords secure on all of your different accounts.
I suppose there are cases where someone got the dropbox password hashes and didn't own dropbox enough to have access to your dropbox account without reversing your hash. However, in most cases where plaintext passwords are exposed, the attacker will own your account at the service that was attacked because they are already inside that system.
If the DropBox passwords are leaked I am going to change my DropBox password whether it was hashed properly or not. The difference is that I won't have to change my password on dozens of other sites.
Plus 1Password has a sort of watchtower setting which shows you which websites had know security issues since your last password change, so you can stay relatively secure all over the place.
I (not OP) use pwsafe on macOS and iOS, a port of Password Safe [1] designed by Bruce Schneier.
It syncs the encrypted blob to iCloud, so I can pull up passwords on the iPhone if necessary. It is fairly simple, not much browser/OS integration - you just open the app, choose a safe/blob, enter the master password, and can then browse/edit your list of passwords, and in particular copy a password.
Simple, not too much functionality - not too much that can go wrong, I hope. (Often it has been the browser integration that lead to exploits in LastPass/1Pass, if I'm not mistaken)
They're badly expressing the idea that there may be only marginal benefits to optimizing the genus of password hashes used, so long as you're using a serious construction designed for storing (or generating keys from) passwords.
The debate over whether scrypt is better than bcrypt is not really still open. The debate over whether the difference matters that much in practice might be.
For what it's worth: for new systems, I use scrypt. But if someone asked, and they didn't have a very specialized application, I'd tell them that switching to scrypt from bcrypt, or even PBKDF2, would be a waste of money.
Right, if they had stopped at "we used bcrypt because we're familiar with it and we think it's good enough for our purposes", I wouldn't have said anything. But the second sentence, claiming that there's an open debate about which provides more protection...
I use Dropbox to store and share screenshots on Twitter
Wildly off-topic, but I just upload screenshots to twitter directly and let them figure out the hosting. Is your usage of dropbox for this simply a legacy of when twitter didn't have support for uploading images?
> But if someone asked, and they didn't have a very specialized application, I'd tell them that switching to scrypt from bcrypt, or even PBKDF2, would be a waste of money.
Argon2/scrypt/bcrypt/PBKDF2 are fine. I think PBKDF2 is the worst choice, but is still acceptable.
The real problem is the prevalence of md5($password) e.g. in software like Piwik.
One concern I have here, is that people are going to perceive this post as "this is what you should do and it's easy!", because the post doesn't really address the complexities of implementing this kind of thing.
Cool approach, you need to compromise two separate servers just to have a usable password database you could run tools against. A key compromise can be fixed quickly and a password compromise is useless without the key.
Of the last 10 or so security engagements I have done, I can only recall one where I wasn't able to compromise _all_ servers. Once you get the first few, the incremental work to get everything is relatively small.
When breaking in, your end goal isn't the database server... it's the domain controller or the configuration management server.
Does anyone know what is a good practice to create a "vault" - the kind that is used for the Pepper in this case?
I have heard of it being a separate, ip restricted server with daily changing ip address, etc. A simpler use case would be to store oauth2 tokens or some kind of PII
I've heard before of it just being stored in the codebase. Doesn't add much security but it does mean both the database server and at least 1 of the app servers or the codebase have to be breached
> I've heard before of it just being stored in the codebase.
Yuck! At the very least put it in an environment variable. Best case is loaded once at server boot from an HSM, kept only in memory, and rotated on a regular basis.
Having it in the code, like all other config, is a terrible idea.
> Some implementations of bcrypt truncate the input to 72 bytes, which reduces the entropy of the passwords.... By applying [SHA512], we can quickly convert really long passwords into a fixed length 512 bit value, solving [that problem].
This part confused me. How can truncating to 72 bytes be a more severe reduction in entropy than generating a 64-byte hash?
I think they're talking about entropy per bit. If they hash to 64 bytes, they integrate all of the entropy of the password, if they truncate to 72 bytes, they throw away all entropy past 72 bytes. This could be a huge problem if you're one of those people who uses a common prefix with a suffix as their password pattern for passwords they need to remember.
Especially if your password has a lot of words, you have a very low entropy per byte. It's not the worst thing in the world at 72 bytes, but if you were choosing between a 16 byte hash of a sentence and an 18 byte truncation, the hash is going to have vastly more entropy.
"A typical supernova releases something like 1051 ergs. (About a hundred times as much energy would be released in the form of neutrinos, but let them go for now.) If all of this energy could be channeled into a single orgy of computation, a 219-bit counter could be cycled through all of its states."
You're not getting through 512 bits of entropy unless your cryptographic methods are severely broken, so broken that having more bits would not meaningfully help.
Assume their salt+hash database leaks. It's true that salting the passwords and using bcrypt would prevent mass cracking of the database (i.e. you would have to crack each password individually, not the entire database at once), it would still be feasible to crack a single user's password if it was weak enough (which would be worth it, if the user is, say, president@whitehose.gov).
Using pepper prevents that from happening, and storing it separately from your database makes it much harder to get both.
They say that the bcrypt hashing takes 100 ms on their servers. If we take that as our limitations, it means we can try 10 passwords per second. So if you had a dictionary of common passwords, plus the salt, you could try 36000 passwords per hour. If the password is "password12345" (hence "weak enough"), you could feasibly crack that.
(I should say I'm not a professional security engineer so I'm probably wrong about everything and would appreciate corrections. )
Great analysis (that in my experience is largely accurate).
Just want to tack on one footnote: 100 ms is based on a single CPU core. Most CPUs have 4-8 cores. So instead of 10 passwords a second you could argue 40-80 passwords a second is believable with concurrent operations (which all popular hash cracking software supports).
It is definitely viable to break a single user's hashed password no matter what the scheme or work factor. Strong hashing algorithms with high work factors just stop you breaking multiple user's passwords quickly (and gives you 1-3 days of delay). It is a stop gap, not impenetrable defence like some believe.
All technical people need to take a day and learn how to break password hashes. Not just the theory, but go download the software and actually do it.
An additional thought: Even if you can't crack a bcrypt hash (say they picked a sufficiently complex password), you can still verify it given a particular plaintext password. So you could go through a database and discover users that are using the same password in that DB as they are on some other sites that leaked weaker hashes, for example. If the hashes you get access to are encrypted, you can't even verify them given a known plaintext password. It's a marginal, but not non-existent improvement.
No - salts mean that the same password hashed twice will give different results, which prevents the use of lookup tables, and trivially discovering users with the same passwords. Given a hash and a salt, you can still verify what the password was.
There is no reason to believe that other sites will pick the same salt for each user, unless they are deriving it from either the password itself, the user id, or some other piece of user private info; which would be bad form anyways.
If you are not using salt, you are vulnerable to rainbow attacks within the same database.
They have an history about stolen passwords... I guess they learnt from it.
Also, I don't think any security would be overkill when you manage perosnal and professional data of millions of people!
you can't have too much security, ever. especially not when loads of personal files and potential sensitive data are getting stored under said security.
Security is not free (as in beer); there exists some level of security that costs $X. If the value of the thing you are protecting behind that security is $Y, then for every X such that X > Y, your security level is a luxury.
Furthermore, there is another number Z, that represents your total operational costs $Z. You could argue that for every X/Z > 0.1^T, where T is an arbitrary threashold, X is a significant cost to the organization and you cannot affort to take that luxury without risking bankrupcy.
So, yeah, Drobox stores loads of personal files... but people with real secrets to keep should know not to rely on freemium services.
I would be interested in the details of the storage mechanism of the global pepper. Is this in an HSM? For AWS customers, something like KMS? There are then huge operational and redundancy issues to think about. Failovers for your HSM. Handling the possibility that AWS might not be available or corrupt the key, other cases. These things are easy to whiteboard, but when the rubber hits the road and you need to think about all the operational edge cases, things get hard quick.
It's not in an HSM. Dropbox states towards the end of the article that they're exploring HSM applications for pepper storage, which I think is a great idea. If I recall correctly, Facebook is also exploring (or has already implemented) an HSM for password database secret key storage.
You raise good points though. This system is significantly safer than best practices (bcrypt(password, 10)), but it has significantly more overhead. There's also diminishing returns here. For a company of Dropbox's size - sure, invest in this. For a company that came out of YC S16, no, don't bother. Just properly bcrypt/PBKDF2/scrypt/argon2 the thing and revisit much later.
I love it, but I would not recommend this system to my clients for password storage unless they had a very mature operations/reliability team.
"Going forward, we’re considering storing the global pepper in a hardware security module (HSM). At our scale, this is an undertaking with considerable complexity, but would significantly reduce the chances of a pepper compromise."
Realistically, how much better is this than the standard bcrypt recommendation? I don't mean for a company the size of Dropbox/Facebook/etc., I mean in general, will this really be much more useful than just using bcrypt? Using an encryption key means that if the database is compromised, as long as the OS isn't (or wherever the key is being stored), the passwords are encrypted in a way that's effectively impossible to decrypt, which is nice. However, are they sure that hashing the password first before hashing it in bcrypt won't cause issues?
Unless Dropbox employs or contracted someone to verify that this is okay (not an engineer, a mathematician/cryptographer who can understand the math behind the algorithms) I'd be hesitant about it. Same goes for other companies that do some complex sequences of hashing e.g. Facebook. Implementing the idea is engineering related, but verifying it is not, and I don't trust engineers (including myself) to verify that a specific algorithm or sequence of algorithms is valid.
From the diagram, Dropbox stores no passwords: it stores an encrypted hash (hasing in two steps, SHA512 and then "bcrypt") of the password. I.e. stored = AES256(bcrypt(SHA512(password), per_user_salt, 10), global_key).
I would like to know if "salted-bcrypt"+SHA512 hashing is really safer than using just SHA512 (e.g. because of the risk of making locating hash collisions easier, etc.).
No, they use the admin access they obtain from you to modify the system so they can obtain admin access again in the future. They do not ever have your admin password.
Their solution is very similar to the mode prescribed by [1] and implemented in [2].
There are actually two problems with bcrypt:
- It truncates after 72 characters
- It truncates after a NUL byte
If anyone is dead set on following Dropbox's example, make sure you aren't passing raw binary to bcrypt. You're playing with fire.
Additionally, if you're going to use AES-256, don't implement it yourself. Use a well tested library that either uses AEAD or an Encrypt then MAC construction.
Not sure I understand the purpose of a MAC in this case. What benefit does it provide to hash storage? If the attacker has write access to your database to tamper with the hash, they will mostly likely also be able to sign up as a user, and clone that (properly signed + encrypted) hash over to whichever account they want to log into. When cracking the hash, they'll just ignore the MAC.
It's an AE construction, but not necessarily an AEAD construction. You can have IND-CCA3 security without additional data.
The converse is also not necessarily true (i.e. an AEAD scheme could be based on MAC-then-Encrypt and IIRC there are some that are built that way), but that's a less useful counterpoint. The recommended AEAD constructions (AES-GCM and ChaCha20-Poly1305) are EtM.
While this is very impressive, it feels like trying to solve the wrong problem. The real problem is getting rid of passwords (Persona, anyone?).
Don't get me wrong, what's described there is super-important to secure the authentication of today, but what about a word for the authentication of tomorrow?
There already are various solutions. Passwordless[0] is a familiar one for nodejs, and I recently bumped into the promising Portier[1], which is, according to its authors, a "spiritual successor to Mozilla Persona".
For most companies offloading your password management onto an email provider is the right way to go. Suddenly, for free, you get MFA, a dedicated security team, and you'll never need to do one of those "Our password database has been hacked. Here's what we're doing..." press releases.
You've eliminated one point of failure (your company), and haven't added any because you are already doing email based password resets.
You can delete all your password related stories from trello or whatever.
You eliminate all the bike shedding around how to store passwords.
You've improved your initial user experience by an order of magnitude. Everyone dreads setting up yet another account password. Don't underestimate the joy a user feels when the signup form is just "click one of these buttons or fill in the email field". (The buttons are 'Connect with Facebook' and 'Connect with Twitter').
Users would much rather flip over to email (which is always logged in anyway) and click a link (especially on a mobile device) than enter a login/password.
That doesn't have to be as annoying as it sounds. If you use Gmail for example you can login via Google Sign-In, and later a system like that can support generic OpenID. Also, once logged in the site you logged into can save your session – it's not like you log into pages every time you use them.
It kind of moves the problem, yes – instead of securing password on each and every site you only have to protect your email password. But that you do have to do already, so imho it only removed one problem.
If this were implemented and my email account (which still uses a password) was compromised, wouldn't said attacker then have access to all my accounts using this method?
The blog mentions, "We’re considering argon2 for our next upgrade". I suppose they could do in-line upgrades: as users are signing in, the SHA512 is piped through the old pipeline for verification and through the new pipeline for migration. As far as I can tell, there's no way for them to swap bcrypt out for argon2 using just their cold store.
> Some implementations of bcrypt truncate the input to 72 bytes, which reduces the entropy of the passwords. Other implementations don’t truncate the input and are therefore vulnerable to DoS attacks because they allow the input of arbitrarily long passwords.
Huh? BCrypt works by stuffing the password into a 72 byte Blowfish key and using it to recursively encrypt a 24 byte payload. Either it's truncating, or it's pre-hashing the password to fit much like they are.
That's just a naive PBKDF2 implementation that's pointlessly reinitializing the HMAC context each iteration instead of just doing it once at the start. The difference between storing a 1 byte and a 1MB password with PBKDF2 should be on the order of a couple of milliseconds.
Having the SHA-512 hash at the beginning simplifies the implementation because the "security" code only needs to handle 64-byte random strings (which are truncated to 54-byte strings for `bcrypt`, but still...). That removes all sorts of stupid edge cases that come with variable-length strings.
> Having the SHA-512 hash at the beginning simplifies the implementation
The hash is there to ensure very long passwords contribute entropy to the final hash instead of being truncated. It also ensures the entropy is evenly distributed - every bit of the password affects every bit of the hash.
> the "security" code only needs to handle 64-byte random strings
You can't feed any typical BCrypt implementation a raw SHA-512 hash because it's not binary safe - it truncates at the first NULL byte.
Well, you can, and it'll appear to work, but it'll be laughably easy to break. It's a pretty stupid sharp edge IMO.
> which are truncated to 54-byte strings for `bcrypt`
72 bytes, because that's the size of the key array. 56 bytes is just where extra entropy helps less, because the last 16 bytes don't affect every bit of the output.
> That removes all sorts of stupid edge cases that come with variable-length strings.
It's just treated as a circular buffer. And what are we doing here, implementing our own version of BCrypt? Yeah, that certainly simplifies things :P
So just to reiterate, taking the sha256 of the password before running bcrypt on it is recommended? Funny, this is the first I've heard of this. You'd think bcrypt would have just implemented the sha256 step into the algorithm?
I am wondering how they store OSX users administrator password, since it isn't being hashed - they actually store the password somewhere... it would be nice if that were addressed somewhere.
Admittedly I don't know what that is and I don't know a lot about how applications on OSX get permissions. From reading that thread I posted I was under the impression that Dropbox stored the password because it was able to reinstate itself as an accessibility service as many times as it liked without having to ask for the admin password.
From reading, that wasn't supposed to be allowed. The only way that could work would be if Dropbox kept your password on file. In effect meaning that the dialogue you entered your admin password for wasn't a system modal - but rather a dropbox modal imitating the system one.
I was under the impression that Dropbox stored the password because it was able to reinstate itself as an accessibility service as many times as it liked without having to ask for the admin password
That is not necessary. A SUID binary owned by root runs with root's privileges. So, they only need the administrator's password once to install the SUID binaries. Afterwards, they have their own 'backdoor' to reinstate the accessibility settings, without needing an administrator password.
So, when they deny storing your password, it's probably true, they don't need it.
(If you are not convinced, write a small C program that executes a shell, compile it, make root the owner, set the SUID bit. You can be in a root shell without ever typing a password. This is why it is a good practice to have as few root-owned SUID binaries as possible.)
They used the initial admin access (via the 'fake' password prompt) to install a tool that has setuid 0.
Setuid 0 means that whoever executes the tool, it always runs as user 0 (aka root). That is what enables them to continuously re-add their bullshit into accessibility settings.
So they aren't storing a password, they're installing a program that has permanent unlimited (barring System Integrity Protection on newer versions) access.
To be clear, they aren't storing your password. They are storing an encrypted hash of your password. This is so they can verify your password when you use it. The alternative is to not have passwords in one fashion or another.
There is something like Secure Remote Password (SRP)[1] where it is possible to verify a password without password being to transmitted to the server at all.
The problem is, it is tricky to implement, a little bit old, and still vulnerable to bruteforce attack (Blizzard uses it, and IIRC their verifier database was leaked once, with g and N being published, so anyone could do dictionary attack on it. I believe Apple is also using it).
We use SRP with our online accounting software [1]. We counter brute-force attacks by using Scrypt for the work factor, and forcing all login credentials to have a high degree of entropy.
So all of a sudden the service in question's security is only as strong as the security of the client's email service? Yeah that sounds like a big nope to me.
This 'idea' has been thrashed out a hundred times, even by supposed 'experts' like Egor Homakov who also says things like "You don't need 2FA, it's pointless and annoying for users".
This is a blog post from 2016 about how they're doing things now, someone mentioned a mistake they made (and fixed in hours) over five years ago.
That doesn't deserve an invocation of the "rekt" meme at all.
It's highly likely that their current success was based on learning from their previous mistakes. They should be applauded for improving rather than treated with snark for having ever fumbled.
So when is Dropbox going to allow users to encrypt files client-side before getting synced to its servers? It should be relatively trivial from both a technical point of view and a UX one.
No. This is impossible. The whole model is broken. A less broken model is the use of asymetric keys for authentication that doesn't require the service provider to "promise" they'll keep your password secret.
This model, of course, is broken in its own way, in that if a user loses their private key, all their data is lost; there's possible recourse or password reset. It's also broken in that, if the company believes that a user's private key was compromised by a third party, they can't completely destroy that key until the user manually logs in and changes it.
Well, one could always leave one's private key with a trusted third party. So the public key model can accomodate those who are uncomfortable with the responsibility of keeping their private key secure. The password model however does not allow for allodial title to a secret.
It seems to address a known pain point in bcrypt (max length), implements a pepper in a secure way (which cannot inadvertently degrade security), and is otherwise doing things which are best practices (high work factor, per user salt, etc).
I know peppers remain controversial (some people claim they're pointless, and make a good argument). But ultimately nothing Dropbox is doing with peppers in this article makes your password easier to break, only harder.
I'd call this scheme 10/10.