Hacker News new | past | comments | ask | show | jobs | submit login
Pwned Passwords in Practice: Real World Examples of Blocking the Worst Passwords (troyhunt.com)
190 points by robin_reala on May 29, 2018 | hide | past | favorite | 87 comments



I'm a huge fan of Troy Hunt and HIBP, but reading this I assumed it was basically an advertisement to get people to sign up for the Pwned Passwords API -- I admit I didn't make it down to the "And Finally..." section where he explains that it's free because there were a bunch of pictures at the bottom and I stopped reading.

But reading through the API docs [0] it shows that API has no rate limit. Impressive.

[0] https://haveibeenpwned.com/API/v2#SearchingPwnedPasswordsByP...


Even if he's not charging for the service, I am pretty sure he's getting more consultancy work from these "PR" (notice the quotes) posts. Even the timing on which he releases them makes sense in a way to prevent people from getting too tired of this service.

Not saying is a bad thing but don't assume something is for pure altruism because not many things are.


Have you ever been paid to do work that you really want to happen? Been paid to improve the world in a small, but significant way? It's a lovely feeling.


But what is the value-add of this service?

Complex passwords are quite useful if the server gets hacked and someone walks away with the (salted) password hashes. Against brute-forcing passwords at the login screen of an application they don't add much value, other than making it the user quite hard to remember what the password for this particular site could be...

Theoretically, if you block a user ID after say 5 or so invalid logins, almost any bad password from the Have I been P0wned list will prevent you from being hacked. The chance that you pick exactly that password from the 1-million or so list is quite minimal.

So with that in mind, wouldn't this service be something for website owners that don't know how to properly secure the information they control?


Because chances are, if you are using a password that is in the list, it's because either it's an exceedingly common password (and you really shouldn't be using it) or you've used it before multiple times and are probably the reason it is in the list (because it was breached on another site).

From experience, most attacks we see now are credential stuffing attacks rather than pure brute force attacks using something like Sentry MBA, with a huge number of IP addresses (the last attack we saw was using over 6 million IP addresses). So throttling sign in attempts at the IP level is almost useless as is throttling at the email level, as the attacker can attempt at least 6 million known email/password combinations to see if those accounts exist on your site.

The only real defence against that is all your users using 2 factor, or creating a psuedo 2nd factor (email them if the attempt is from an unrecognised IP).

Edit: Of course the other helpful defence is to ensure your users aren't reusing passwords, which is where Pwned Passwords comes in.


I can attest to this. Credential stuffing was the number 1 reason we decided to add the pwnedpassword validation to our signup flows. We were seeing thousands of IP addresses and hundreds of thousands of requests over a few days. Rate limiting slows it down but doesn’t help all that much. Rate limiting on a specific username will prevent brute forcing but exposes you to DOS. Rate limiting by IP becomes less effective when thousands are involved and most requests end up succeeding.

Disclaimer: work for Kogan who is mentioned in TFA.


> Rate limiting by IP becomes less effective when thousands are involved and most requests end up succeeding.

What do you mean by "end up succeeding"? Most requests successfully authenticated? On the first try? Second try? Tenth try? Hundredth try?

(I'm not trying to doubt the utility of pwnedpassword validation; just hoping you can help me understand the threat you're facing and why IP rate limiting didn't help much. Thanks.)


Perhaps an example will help.

Lets say you have IP throttling/rate limiting. And you have it set to an extremely conservative limit - 1 sign in attempt every hour. This is great for the brute force threat - 24 passwords a day can be attempted by 1 IP. Infeasible for any brute forcing.

But now lets say the attacker has access to a botnet with 6 million unique IP addresses (not theoretical - see my comment above).

Now for each of those 6 million IPs they can try 24 passwords a day - i.e. 144 million attempts a day without ever triggering the throttle.

Bear in mind also that they aren't just trying random passwords for an account - they have a compiled/combined breach list of known account/password combinations from other breaches. So they can attempt 144 million known combinations a day. Without hitting any throttles (this is what the parent above means by "end up succeding").

What percentage of your users reuse passwords and have been exposed to at least one breach? I would suggest it's quite a high value. How long do you think it will take a credential stuffing attack to identify those accounts on your site when they can try 100's of millions of combinations a day?

This is the threat vector.


ISTM the next step would be to rate limit for a given account without regard to IP. Sure that's a potential DOS, but we can wait until that's actually a problem before worrying about it.


They aren’t trying the same account multiple times. Well they may be conincidentally (if the user uses a unique password per site and has been breached from multiple sites so appears in the breach list multiple times with different passwords) but not that frequently. What they are looking for is the intersection of users who reuse passwords and have been exposed by a breach and those users who have created an account on your site reusing the same password. Which is perhaps not surprisingly a relatively large percentage of your users.


From a wrestler to a wolf: thank you. Very helpful.


Thank you, this covered what I was trying to say.


I think what he’s saying is that if the attacker has enough IP addresses at their disposal, they can spread out the attack broadly enough that any IP-based rate limit that would stop a bot would also impact human users. Thus most of the bot attempts slip in under the rate limit.


> Rate limiting on a specific username will prevent brute forcing but exposes you to DOS.

Why?


Not the OP, but I think he's referring to the potential for an automated service spamming thousands (or more) accounts enough to lock them.


You can lock a user out of their account by spamming the server with login attempts.


Yes. In this case the denial of service is against specific customer accounts for the lockout duration, not against the availability of the site.


This would be bad, but what's the motivation? What fabulous prizes await the DOSer of some random account on your service?


Locking users out of their accounts isn’t the goal, it’s just an unfortunate side effect.


If Jimmy uses the same password everywhere, and his password has gotten out in a prior leak, then it's likely that his favorite username is associated with that password.

If he comes to my site and signs up with that password, an "evil person" doesn't need 5 guesses to get into his account - they just need one, because they already have it.

If, however, I check Jimmy's password when he registers, and block him from using it: (1) I keep him from immediately losing control of his account on my service, and (2) I provide Jimmy with the knowledge that his favorite password was leaked and he needs to do something about it.


I can think of a few value-adds for people who practice moderate opsec:

1. If one of your passwords is suddenly rejected, it may be a great moment to refer you to HIBP.

2. Traditional password complexity estimations may be overestimating your passwords complexity, e.g. the pass phrase "My house is blue" is fairly long and will likely be flagged as complex enough. But it's within the realm of a password-phrase aware tool.


Thinking that you have a 0% chance of loosing your hashes is one of the strongest possible indicators that they are on an open filesystem being served by Apache.


It's free for end users, but he has had a donation link for quite sometime. He also entered into a deal with AgileBits to advertise 1Password (and its integration with HIBP) on the pages where people do the search on his website. So I'm sure he's getting paid for this...or rather reaping the benefits of spending time on HIBP. As others have put it here, he also benefits from the huge PR and the opportunities that come up due to that.

Yet, this has become almost like a public good over the years, and providing it at no (extra) cost than fundamental costs of accessing it is appreciable.


Just to add, the first bit felt like an advert for 1Password.


I wrote this small python function to check if a password is part of a breach by transmitting only the first 5 digits of it's hash: https://gist.github.com/mcdallas/d94ecd8b34a6bf57a162a7af0ce...


FYI: "Due to the massive popularity of the range search over searching by complete password hash, the significantly improved performance and the enhanced privacy controls, searching by hash will be discontinued on 1 June 2018."

Shown in API docs under "Pwned Passwords overview" and links to here: https://www.troyhunt.com/enhancing-pwned-passwords-privacy-b...


I think what he means is that searching by full hash will be removed in favour of using the /range endpoint (the one I am using)


I can't tell whether you knew this from the way your post is written, but this refers to searching by complete hash, not searching by the first 5 digits (aka range search).


Your code uses the first 5 digits of the hex digest of the SHA1. This 5-digit hex number has roughly 1 million combinations.

That seems way too low.

For context: If you take just dictionary words from the world's 5 most popular languages, you'd have more than 0.5 million words.


This is from the API docs, you are not allowed to pick the number of digits.

https://haveibeenpwned.com/API/v2#SearchingPwnedPasswordsByR...


Oh I see, so you send only the first 5 digits, but you then get back ~16,730 digits (478 * (40-5)) and search those for your full hash.


So what if you then took positives from 5dig hash matches and searched then through 7dig hash matches? Like a sieve?

With a good cache, thatd save some bandwidth.

Maybe that's wishful thinking. I can't imagine checking new passwords more than a few dozen times per second at the most. Bigger sites probably just write their own password integrity tools.


I applaud your optimism!

Most places just enforce byzantine password requirements, 13 digits, must have ~, uppercase and a palindrome prime integer in it.

Obligatory password XKCD, think of the children. https://xkcd.com/936/


I can imagine if there was a more standard password definition, eventually specialized hardware would adapt to whatever the standard was, in terms of cracking attacks.

I use a memory trick to have very strong passwords, but most people probably wouldn't be willing to invest the effort.

Someday there will be a better way.


I wrote a Rust tool to process the downloaded hash list into a compact (29GB -> 1.5GB) database that can be efficiently queried: https://github.com/Freaky/gcstool

I made a start on a Ruby port too: https://github.com/Freaky/ruby-gcs - I have vague plans to finish it off and write a Rodauth (http://rodauth.jeremyevans.net/) plugin for it.


Adding the the plug train :)

I wrote a devise extension thats essentially a one liner to add this check on signup (and a small code block to add on signin)

https://github.com/michaelbanfield/devise-pwned_password

Nowadays most of the logic is encapsulated in the pwned gem

https://github.com/philnash/pwned

Which is a good choice if you arent using devise.


Would be nice if they made a bloom filter for anyone to use. On second thought, you can do that yourself based on the SHA-1 hashes of passwords they offer for download.

Edit: On third thought, a bloom filter for 502M entries and a false positive rate of 0.1% ends up as a 800MiB large filter. Binary-searching the whole dump is surely faster.


> a bloom filter for 502M entries and a false positive rate of 0.1% ends up as a 800MiB large filter

With that sort of FP rate it's not really much use beyond filtering API calls. I'd suggest 2GB[1] as a more sensible minimum. A compressed filter can get this down somewhat.

> Binary-searching the whole dump is surely faster.

Not really. log2(500M) is ~29, k for a suitably sized bloom filter's only 23. Interpolation search can get you a result in more like 10 seeks, but a bucketed bloom filter can get your lookup down to a single read.

Having spent a fair bit of time faffing about with this stuff I ended up settling[2][3] on Golomb compressed sets[4], which can get the full list with a 1-in-10 million FP rate into 1.5GB.

[1]: https://hur.st/bloomfilter/?n=500M&p=1.0E-7 [2]: https://github.com/Freaky/gcstool [3]: https://github.com/Freaky/ruby-gcs [4]: http://giovanni.bajo.it/post/47119962313/golomb-coded-sets-s...


Oh right, I was wrongly thinking you'd have to memcmp the whole size of the filter. It's simply too warm today for thinking. Did you look into Cuckoo filters as well?


> Oh right, I was wrongly thinking you'd have to memcmp the whole size of the filter.

Yeah, it's just k single-bit lookups - ideally you do something to get them into clusters, like dividing the database into sub-filters, so you're doing random lookups into, say, a 32KB chunk instead of a whole 2GB filter.

> Did you look into Cuckoo filters as well?

Cuckoo filters look like an interesting alternative and looking at them more closely is on the to-do. I don't think they'd have any significant space savings, though - they're similarly about 75% the size of the equivalent bloom filter. Maybe they'd be faster for lookups?

I'd also be interested in playing with matrix filters[1], which supposedly get close to the theoretical limits for these sorts of structures. Implementing them seems rather more involved, sadly - particularly given the only reference I can find is a fairly inscrutable CS paper. Show us the code damnit.

[1]: https://arxiv.org/abs/0804.1845


I'm trying to figure out if I need to do this if I'm requiring minimum sixteen characters... after searching the docs and several of the blog posts, I can't ascertain if the corpus contains any passwords/phrases >= 16 characters. I don't want to be running this check on every passphrase create/modify if the corpus contains none or very very few passes length 16 or greater. Does anyone have any insight into the contents? Or, a way to query only the portion of the corpus that has 16 or greater?


I just tried 'passwordpassword' and it was there. You can download the whole dataset at the bottom of this page

https://haveibeenpwned.com/Passwords

Its a losing battle trying to add byzantine rules to prevent users doing things like using their normal password * 2, so its probably a reasonable check to add.


Aha, I tried a few obvious 16s but not that one... thank you! I guess we'll add it in. The last few dumps (other sources) I reviewed contained nothing over 15 characters, but I'm imagining they will start creeping in as more folks demand longer phrases.

Still, I'd bet the vast majority of the bad passes are <16, seems a heck of a waste of energy and bandwidth to check my user's passphrases against (guesstimating) 0.05% of the corpus.


Soliciting people to enter their passwords seems like a bad approach both because it's risky and because it helps train people to do risky things - better would be a service that attempts to crack passwords and if it succeeds, disables the account until a new password is set. Seems like a service that everybody should use, if it existed. Much better than arbitrary rules about "good" password character sets.


There’s no one specific way to crack a password. It all depends on the implementation. The most basic case is just storing the password in plaintext, and plenty of companies are more than happy to do that. Passwords don’t exist as some separate entity, they’re attached to a system. So cracking an Adobe password might be easy, but cracking a Dropbox password incredinly hard.


I suppose. I was thinking of just hashing all known passwords and plausible passwords and locking any accounts that matched.


The only thing I had a hard time understanding-

OP really cares about his privacy/security/password, but then he uses a secondary system to store it?

Is this the best way to do this? Break into the secondary system and everything is available. Keyloggers, eyes, stolen computers, all have the possibility of everything available.

I considered other solutions like written + put in a lock box in a bank, but thats really inaccessable.


It is the best way to do it for the average user.

Modern password management services are incredibly secure, with client-side encryption of your secrets, among other protective mechanisms (to guard against keyloggers, stolen computers, ...)

There is still a risk, because you're trusting third-party software (which in some cases is closed-source – including 1Password), but for most people that is a much lower risk profile than if they were managing passwords themselves.

For specifics on 1password's security, check out https://1password.com/security/


> Modern password management services are incredibly secure

There lots of password managers that have had gaping vulnerabilities. From memory I'm sure I've seen LastPass vulnerabilities top HN just a while ago... yeah probably this one: https://www.bankinfosecurity.com/lastpass-patches-password-m...


It does sound a bit silly, but the alternative is to keep them in your brain, and AFAICT it's impossible to do that these days. I have hundreds of accounts in my password manager, I couldn't possibly remember secure passwords for all of them.

FWIW I'm not a fan of keeping them in some cloud service dedicated for storing passwords (1PW's subscription offering, lastpass) because if someone is going to break into servers and steal passwords it's going to be from those fine people.

A middle ground is to use something like KeePass, where the encrypted password file is stored where you want it to be stored (ie your hard drive), where you can manage how you want to distribute that file between places you need it.


I use PasswordSafe which is open source, and I have it backed up to my dropbox (so my phone can access it). It is less convenient than most of the password managers out there having browser plugins that will automatically fill in your password, or generate one right in the text box, but the data is mine.

I still have to rely on the mobile app which isn't open source, but it is the recommended app from the main developer.


A keylogger is pretty much game over regardless of your password strategy. Your passwords can be fully offline but you still need to enter your bank password on a connected device.

On the security spectrum, for the vast majority of people, a password manager is a step up in how safe their passwords are.


Great in theory, I don’t think it will play out good because of the average computer user. Ask any help desk employee.


Word. Password security should be proportional to chance of a dictionary attack * risk of a successful attack. Mozilla want a 12 character passwords to file a bug report when they really should publish an email address an anonymous html form and say please.


I know what you are feeling, but there's a balance to making bug filing easy. For a large visible project, make it easy and the statistical noise drowns everything out.


They do? Help / Submit Feedback / Firefox makes me sad, then fill in the form.


they gave people the ability to check any individual password against the online Pwned Passwords service

What could possibly go wrong?

This works by locally hashing your password, then sending only the first 5 hex characters of the hash to the server. The server sends back all matching hashes of bad passwords it has on file. Typically this returns a few hundred hits. The local client (probably some piece of Javascript) checks the hits against the password hashes returned. If there's a match, the password is in the database of bad passwords. This supposedly protects the password if communications with the checker are intercepted.

But does it? If an attacker can see those first 5 hex characters, they too can get the list of hashes of matching passwords. There are only a few hundred of them, and they're hashes, not the actual passwords. One of those is the hash of the user's password. Now they know what hashes to try.

An attacker presumably has a big database of likely passwords to try. So they can create a database of hashes locally. The hashing algorithm is known to the client, after all. So now they have a few hundred passwords to try for a break-in. Try those over the next few days, and they're in.

Is it really that bad, or am I misunderstanding something here?


The point of a secure password is that it _won't_ be in the list of hashes by the server. The service should prevent the user from using any of the passwords that are matched by the password database.


The point is if you get a match you deny that password and make the user try again. So this would only give you a list of 100 or so passwords that it couldn't be. Not even a drop in the bucket of the namespace you need to search.


> One of those is the hash of the user's password.

Why? Your hash might not be in the list. There are 16^35 hashes starting with these 5 characters.


That doesn't sound so bad. Real attackers probably try something easier first.


The Okta chrome extension mentioned in the article seems very cool and useful.

But how do we trust the Okta chrome extension not to post all credentials to the developer? Even if its doing nothing shady now, in some future update?


With that security model, how do you trust any extension at all?

At least with this one, you can audit the source code and build it for yourself[0]

0: https://github.com/OktaSecurityLabs/passprotect-chrome


I once made a webpage where HNers drag-dropped their entire iphone backups. It got me thinking. http://markolson.github.io/js-sqlite-map-thing/


Whenever there are tools like this I usually just visit the site then unplug my ethernet/disconnect from WiFi. If it's actually client side JS it will work.

I'm pretty sure this is safe, but if there's a way to defer sending an HTTP request to after the page being closed...


If you do it from a private browsing window, maybe that would be fine. Maybe. (Otherwise the page could exfiltrate information via localstorage or something.)


You'd just motivate me to use client-side storage if the upload failed. Don't dare come back to the page when online! :)


It decrypts their backup locally? The link to code in the footer just refreshes the page.


It parses an unencrypted sqlite db in their backups in client-side javascript. The whole thing is client-side javascript.

But... if we'd wanted to be nasty we could have sent everything to servers we controlled.

The src link stopped working when gitlab.io arrived I think https://github.com/markolson/js-sqlite-map-thing/


Stupid question here. But shouldn't websites/apps simply lock the account for 5 minutes after 3 bad password attempts? -- basically ending the usefulness of bad password lists, etc etc. Do sites actually let you run limitless password attempts -- thus really find these password lists important? If your password was "foofoo", and there was a password lock after 3 attempts I don't see how you would crack it with a password file (in this lifetime).


A lot of sites don't want to lock you out because that is a denial of service vector. If someone wanted to mess with your site, especially if your usernames are public (like reddit, eBay, HN, etc), then all they have to do is keeping sending login requests with bogus passwords to lock out all of your users.


I've tried that once in a site and it was a maintenance nightmare. Users kept bombarding me with emails asking to unlock their accounts. Turns out 99 times out of 100 that a username/password combo has been mistyped it's from users jerking around or not remembering the password, rather from hackers trying to brute force their entrance.

Now imagine having a site with a few million accounts and 0,1% of them mistyping the password every now and then.


Sounds like a case of not handling the UX of the feature properly. What you can do is either allow them to unlock the account via email, or have a time-based unlock, or both. We do 72 hours or unlock it via email.


One could set the threshold to 40 failed attempts. brute forcing requires thousands of attempts.


You don't have a 'forgot password' system?


As someone explained in a different comment above, modern exploit scripts will just rotate around through the entire database, trying different userid’s in series to avoid hitting rate limits per userid. Most criminals don’t actually care WHOSE account they hack, as long as they can get some monetary or data value out of any hacked account.

And, if you do a hard lock-out, that’s an easy way for an attacker to DoS your entire user base in short order.


Because the email/password combination is already known. From a previous breach where you used the same combination. So the attacker needs exactly 1 attempt to get into your account.

And with the ease of using botnets now, they can get access to a very large number of IPs to distribute the attack, so you can't reliably throttle at IP level either.


I think it’s more common to crack the passwords offline from a breach. Enough people reuse passwords that you can then directly try a persons password on multiple sites, without the need to guess.

I think I’ve had this happen to a few old accounts of mine that reused a simple password that had been leaked. (Of course I can’t really verify how the hackers did it, but it seems likely to me)


Seems like there are some legitmate reasonings in this thread. I also wonder how much of it is just not wanting to explain making it potentially more difficult for a user to log in.

Sort of like how credit card providers could make things much more secure but choose not to because of fear of reducing transactions.


reminder: never enter your email on one of those 'have i been pwned databases' . Find a way to download the database offline and then check it.


This is not a bad rule in general, but it's reasonable to make an exception if you know you are connecting to a trustworthy site. Most people trust Troy's site because he is a known figure in the security community, and is transparent about who he is and his goals for running it.

https://haveibeenpwned.com/

That said, the domain name is (IMO) needlessly clever and does not help instill trust in folks who are not already familiar with it. I recommend it to friends and family, but I cringe a bit whenever I have to post that URL into an email to, say, my aunt who is a retired librarian.


I'm signed up his automated alerts whenever my email appears on a leaked password list online. I had one through just the other day, then a load of alerts that my Ubisoft account was being accessed from various places all over the world (I have no idea when/why I set up a Ubisoft account, but there you go).

It's a very useful service - and free. No way am I going to manually download every single password dump that appears online and search it for my details.


Same for my Ubisoft account last night. I've been using 1password for all my new passwords, but obviously there are still old accounts lying around with overused passwords of mine.


Never enter your email into one of these services, but go ahead and use it to sign up for every mom&pop online shop, forum, etc...

Treating your email address as a protected piece of information is a laudable goal, but seemingly impractical in the real world.


Not sure why this was voted down. By entering your email, you are providing lots of valuable data to a potential attacker: that you are active, your IP address and environment, all of which can help unlock your account on whatever service was compromised.


Who could possibly get this info on https://haveibeenpwned.com ? Troy? I bet he doesn't even keep logs. Anyone listening to your HTTPS connections? Then you have some more serious issues.

No seriously, Troy's site is awesome, and spreading FUD is not doing them any service.


It’s good advice in general for most folks. For those that know of Troy and this particular site it’s fine, but the general recommendation still stands.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: