You don't have to store every compromised password ever to ensure users have a reasonable one. Just follow the NIST recommendations, and check against the top million, or top few million, and you're all set. If you use a good password storage library, and you slow down guessing attacks, or stop them altogether, your users will be safe. Those 300 million compromised passwords have a very long tail, and if you let a hacker attempt a few million password attempts running down the list, you've already failed.
IMHO, you only need to block the top few (thousand) if you have a policy to block rapid/unlimited guessing. The issue is targeted attacks, when a strong password was reused by someone between say linkedin and their corporate account.
If I can get a list of compromised email addresses and passwords, I will be trying those email/password combinations across services, not passwords from the top 100.
Attacking accounts with reused credentials seems to be pretty common.
> We have some ability at ArenaNet to watch hacking attempts live, and it tells a fascinating story. We watch as hackers use tens of thousands of different IP addresses to scan through millions of attempted account names and passwords, almost all of which are for accounts that don’t even exist in our database, looking for matches. They’re not guessing or brute-forcing passwords; they’re trying a very specific account name and password for each attempt. For example, account name “joe.user@example.com”, password “alligator101”. If they don’t get a match immediately, they may try a variant like “alligator100” or “alligator102”, then they quickly move on to the next entry on their list. And it’s interesting to see that the passwords on these lists are mostly quite good passwords. For every one account on the hackers’ lists with a password like “twilight” (real example, ಠ_ಠ), there are dozens of accounts with good strong passwords. So the world at large clearly knows how to pick good passwords; the reason people are still getting hacked is because they use the same passwords on multiple sites.
I suspect it would be better if possible to hit a bigger list.
It's not to protect against an external brute force, but a leaked database table being able to identify valid passwords, and then using that to access the service or check other services for a shared password.
It's unfortunately far too common that a SQL injection, bad employee, lost backup, whatever leaks a copy of the account table in the database, which even when using scrypt/bcrypt etc is plausible to slowly break accounts beyond the first few thousand common passwords. Depending on work factors, it's about 250ms to make a guess, which is about 345,000 guesses per cpu/core per day.
Adding a salt to the password before its hashed is the way to handle this type of attack.
In general I'd say that most organizations need to get less fancy about password security not more. Most mechanisms that companies employ to make things more secure don't. They make things less secure because they add attack surface, or they fail to account for the behavior of actual users when faced with any additional effort.
Adding a salt and hashing a password is not something someone should be doing in 2018. Use scrypt, or another one of the recommended algorithms, as it is a complete implementation that is well researched and can be simply dropped into most systems.
And my response was in reference to how many password you should blacklist, which using a good algorithm is still feasible to brute force easy to guess passwords in an offline attack.
Fair enough. It's hard to protect against users being silly. No matter how you salt, hash, and encrypt, if someone knows the username/password, they're in. I delegate auth for the systems I build to a SAML provider, and require 2FA. An expert can do a much better job managing passwords than me, and as long as I trust their identity document, we're good to go.
The limit on guessing can't be enforced if your databases are ever leaked, but even so they would almost certainly be per user, and if you just want a bunch of accounts to post spam, you could just select the 3 most common passwords and brute-force accounts.
Slightly more sophisticated would be to use a botnet to do this, so that you can't block the source.
What if the database of hashes gets leaked then there will be no possibility of throttling. Furthermore as somebody else set given a false positive the proper API may be used.
You can thwart or mitigate most brute force attacks with key stretching. Even if you only require 50 msec of work per password on current hardware, you have made the prospect of a brute force exponentially more difficult.
If the database gets breached with write access, a user's password can be bypassed entirely, even if it can never be discovered.
If the database contains all of the information the hacker wants to know, they don't need to hack your account, they can just read the tables directly.