Hacker News new | past | comments | ask | show | jobs | submit login
I Am Releasing Ten Million Passwords (xato.net)
594 points by m8urn on Feb 10, 2015 | hide | past | favorite | 216 comments



Barrett Brown was not convicted merely for linking to data on the web. He was convicted for three separate offenses:

1. Acting as a go-between for (presumably Jeremy Hammond) the Stratfor hacker and Stratfor itself, Brown misled Stratfor in order to throw the scent off Hammond. Having intimate knowledge of a crime doesn't make one automatically liable for that crime, but does put them in a precarious legal position if they do anything to assist the perpetrators.

2. During the execution of a search warrant, Brown helped hide a laptop. Early in the trial, in advancing the legal theory that hiding evidence is permissible so long as that evidence remains theoretically findable in the scope of the search warrant, Brown admitted to doing exactly that, and that's a crime for the same reason that it's a crime when big companies delete email after being subpoenaed.

3. Brown threatened a named FBI agent and that agent's children on Twitter and in Youtube videos.

The offense tied to Brown's "linking" was dismissed.

Brown's sentence was unjust, but it wasn't unjust because he was wrongly convicted by a trigger-happy DOJ; rather, he got an outlandish sentence because he managed to stipulate a huge dollar figure for the economic damage caused by the Stratfor hack, which he became a party to when he helped Hammond.


The trafficking charges were dropped but he still was charged as an accessory after the fact. http://cryptome.org/2015/01/brown-105.pdf


Yes; that's #1 in my list. Thanks for the link to the sentencing memo!


I never followed the case, could someone clarify how he was an accessory after the fact?

Did they explain how he misled Stratfor? Were they investigating their own breach and contacted him somehow? Or did he hide evidence?

It'd be great to have clarity on his wrongdoing related to the hacking. The parts about threats and hiding evidence seem tertiary to peoples defense of him. Since the major crime that he became famous for was the hacking by anonymous.


According to Kim Zetter:

The first charge is a new one and relates to assistance Brown allegedly gave the person who hacked Stratfor “in order to hinder and prevent [his] apprehension, trial and punishment.”

According to the government Brown worked to create confusion about the hacker’s identity “in a manner that diverted attention away from the hacker,” which included communicating with Stratfor after the hack in a way that authorities say drew attention away from the hacker. The hacker is not named, and it’s not clear if it’s convicted Stratfor intruder Jeremy Hammond, or an earlier hacker who’s known to have penetrated the company first.


Thanks. Seems like during sentencing this was the key point related to accessory:

> Loss amount of more then $400,000 but less than $1M

This was worth +14 points which was higher than any other single guideline - including threatening an FBI agent.

I guess the lesson here is that if the crime at hand involved any significant amount of money then even if your role was minor (and after the fact) you can still get serious punishment.


Yes. This is a real problem with the federal sentencing guidelines.


What were the threats against the agent and the agent's children? I'm asking because I read some of them ("ruin his life", "look into" his kids), but I'm not sure which of those are protected under the First Amendment.

Broad categories of rude speech are protected under the First Amendment, including things like, IIRC:

1. Saying if President Johnson makes you pick up a gun, he'll be the first in your rifle sight. (Watts v. United States)

2. Telling a cop "I'll kill you, you white devil" while you are in handcuffs and unable to kill him. (? v. ?)

3. Swearing "revengeance" upon the Jews. (Brandenburg v. Ohio)


It was "White son of a bitch, I'll kill you", and it was Gooding v. Wilson.


And as far as I can tell, it wasn't that what he said was constitutionally protected. It's that the statute he was charged under was unconstitutionally broad, because it prohibited "abusive language" in general. A more specific statute, prohibiting only threats, would have likely been ruled constitutional.


I'm not sure about "likely", but upon a closer reading, I agree that the Gooding decision looks like it was mainly about the broadness of the statute. Thanks for noting that.



"threats may not be punished if a reasonable person would understand them as obvious hyperbole". Obviously, I don't know what the court would have held in this case, but it seems possible it would have held that this was "obvious hyperbole".


Could someone with legal background please explain the concept of "protected speech"?

I thought the Constitution is "where the buck stops", the Supreme Law. It takes precedent over any law, legal theory, precedent, tradition, etc.

The first amendment, as written, outright "enjoins" Congress from creating any "exceptions" or define what kinds of speech are actually protected.

I also think the Constitution provides one, and only one way for Congress to modify the 1st amendment so there can be categories of speech that can be "abridged": a constitutional amendment.

So, which kind of legal maneuvering, or reasoning as been used to somehow justify the amending of the 1st amendment without actually amending it?


"Freedom of speech" is understood to mean freedom to express any opinion or idea. It doesn't literally mean freedom to speak arbitrary words. There are many, many illegal acts which you can commit by merely speaking words, like fraud, blackmail, harassment, etc.


The buck stops where the Supreme Court says it does. They are the ones who get to decide the meaning of the words in laws and the Constitution. As a side note, the first amendment does not bind the states from creating laws that limit speech as it is written that 'Congress shall make no law...'. This comes from the Doctrine of Incorporation[1] based off of precedents set by interpretations of the Fourteenth Amendment. Now there's an amendment with an interesting history. [1] http://en.wikipedia.org/wiki/Incorporation_of_the_Bill_of_Ri...


Here's some details on the first amendment and what is and isnt protected under the constitution:

http://debmcalister.com/2011/06/03/7-things-you-cant-claim-f...

Some things not protected:

1 - Hate speech

2 - Speech that incites violence or encourages the audience to commit illegal or dangerous acts.

3 - “Material support“ to domestic or foreign terrorist groups,

4 - Public speech made in the conduct of their duties by public employees.

5 - Slander, libel or defamation.

6 - Publishing confidential, trade secret, or copyright material

7 - True threats. Like many other areas of First Amendment protection, context, target, and intent matter in determining what is or is not a true threat. Some threats are always illegal — any threat to the President of the U.S., for example.

While the constitution is broad in its definition of what is free speech, local, state and the federal government still have some say in what they considered to be covered by the first amendment. Even in some cases it's arbitrary such as the Westboro Baptist Church hateful protests are legal, while a kid burning a cross is not.


I disagree about the "hate speech" one. In particular, the article you link to cites Chaplinsky, which was limited very strongly by Brandenburg, Virginia v. Black, in which the court limited the statute in question to apply only when intimidation is intended, and Hustler v. Falwell, which the article gets completely backward - Hustler won!


> It takes precedent over any law, legal theory, precedent, tradition, etc.

The framers were (mostly) lawyers, and wrote the document against the background of English common law: http://www.libertylawsite.org/liberty-forum/why-you-cant-und....

Exceptions to the "freedom of speech" are part of the Constitution, despite not being written in so many words, for the same reason "due process" requires a presumption of innocence, even though the latter phrase appears nowhere in the document.


>The offense tied to Brown's "linking" was dismissed

This masks the scary reality that someone was indicted, arrested, and prosecuted for posting a link (not to mention that it was dismissed as part of a plea - not for lack of legal merit). While in this case there were other charges as well, there didn't have to be - all of the same pre-trial horrors (including possible detention without bail) could have occurred with only that charge. The fact that such a charge may eventually be dismissed/beaten at trial after your life is burnt to the ground for posting a link is little comfort.


That's also a misleading way of framing the issue. Brown wasn't charged with "criminal linking" (an offense that does not exist). He was charged with deliberately and knowingly assisting in the breach of Stratfor, and subsequent maximization of the damage from that breach. And remember, he was convicted of doing that; they just pursued a different vector for it than the link. Keep in mind also, they didn't just work back from people who posted links. Hector Monsegur ratted Brown out.

Most criminal statutes look insane if you ignore the mens rea component and consider only the actus reus.

Probably the right way to address your comment is to acknowledge the sentiment behind it. It would be ominous if prosecutors trawled the Internet looking for the wrong kinds of links --- people RT'ing updates from Anonymous, for instance, or relaying already-public newsworthy facts from breaches --- and fit accessory liability cases around those innocuous acts. It is worth being wary about prosecutors doing that, because computer crime laws are poorly rigged and set up terrible incentive systems for prosecutors.

It's just that those concerns are not yet vindicated by the Brown case.


While "criminal linking" doesn't exist as a standalone crime, prosecutors have essentially tried to make it exist via other statutes. I don't know the disposition of the case, but a man in the UK was ordered a few years ago to be extradited to the US to stand trial for criminal copyright infringement after operating a site that offered links to copyrighted sports broadcasts [1]. In the Brown case, they tried to use the conspiracy statutes.

In both of the above examples, while not charged with "criminal linking," the actual conduct was linking to something prosecutors didn't like. The loud and clear message they are sending is "link to things we don't like, and we'll find a way to get you". That will have a chilling effect on free speech.

[1] http://www.theguardian.com/law/2012/jan/13/piracy-student-lo...


I think you may be oversimplifying the O'Dwyer case.

https://news.ycombinator.com/item?id=4153824


This assumes, though, that he would have been put through everything you describe even if he had only shared a link. But, as described in detail above, this was only a small piece of the government's case. I seriously doubt the government would ever have brought charges if all it had was the posting of a link.

We should also think a little bit harder, I think, about whether posting a link is never criminal. It seems to me that if someone posts a link to intentionally further a criminal conspiracy, it seems like it could plainly, and unproblematically be criminal. Accomplice liability in particular makes lots of other things, that would otherwise be innocent, into crimes when they are done with the wrong sort of intent.


> I seriously doubt the government would ever have brought charges if all it had was the posting of a link.

If it can be included as a charge on an indictment, it can be the one and only charge in it as well.

> We should also think a little bit harder, I think, about whether posting a link is never criminal.

No, we shouldn't. Linking to and/or writing about anything (absent actual participation in a conspiracy) isn't a crime in a country protected by the right to free speech.


> If it can be included as a charge on an indictment, it can be the one and only charge in it as well.

I think you've lost track of the context. The point is that nobody here was arrested and had his "life ruined" solely on the charge of having posted a link. Nor would the government ever be likely indict someone on only that basis (unless the case was very compelling, see below), given the significant likelihood of the sole charge being dismissed.

> No, we shouldn't. . . . (absent actual participation in a conspiracy)

That's not too different from what we're talking about , is it? Actual participation in a conspiracy (which, no doubt about it, can be accomplished by posting a link) or, I would add, acting as an accomplice.

But while I'm at it, your broader claim is also incorrect. How about perjury? You can do that in writing. Slander? Intentional infliction of emotional distress? Threatening the president? Mail fraud? Criminal contempt? Murder for hire? All crimes accomplished by writing about something that exist, yes, in a country protected by the right to free speech. There are many more.


Couldn't you employ the same "free speech" logic to someone ordering a murder?

Again, it's not the speech that's being criminalized; it's the intent animating it. Think of the link not as a crime in and of itself, but simply as evidence of Brown's effort to assist in the real crime, which was unambiguously illegal. If you follow the case closely, you'll see that's exactly what's being charged.


>Couldn't you employ the same "free speech" logic to someone ordering a murder?

Nope. Ordering a murder is a crime. Sharing a link is not.


This doesn't strike me as very productive. Whether sharing a link can be a crime is, of course, exactly the point under debate.

And I don't see how you can dispute that whether sharing a link is a crime depends on what is accomplished, and what is intended, by sharing the link. There is, of course, no law that criminalizes sharing a link per se. But there are plenty of laws that criminalize things you can do by means of sharing a link. Take GP's example. You write up a murder-for-hire ad on your private server and post a link to it on HN. That's solicitation of murder, no less than if you had made the solicitation in person or by mail. You may as well argue that talking to someone, or sending a letter is not a crime.


I don't know, sounds like he got off pretty lightly considering he threatened an FBI agent's children. I would expect the jail time would be a lot higher, but I guess I don't know what guides the court's decisions in these kinds of cases. I suppose five is enough time for him to figure out the error of his ways.


His sentence was dominated by the accessory charge, and the threats don't seem to have been a factor at all.


The threats actually accounted for 48 of the 63 months according to the EFF article that the OP linked to.

https://www.eff.org/deeplinks/2015/01/eff-statement-barrett-...


EFF's reporting appears to be contradicted by the (now public) sentencing memo. Orin Kerr analyzed it at length for WaPo a few days ago.


Strange. Almost every article I'm finding echoes the EFF's statement about 48 months, but Judge Lindsay's own explanation of the sentencing is as Orin Kerr says. I wonder where that 48 figure came from.

http://www.washingtonpost.com/news/volokh-conspiracy/wp-cont...


I think you've simply stumbled upon another illustration of how modern "journalism" works :-)

What was that quote about a lie traveling halfway around the world before truth has its pants on?

This is not the first time the EFF has done this, by the way.


To be honest, EFF isn't exactly the most reliable source on these things. They too are very very biased.


but what do you think about the big picture?

I don't know much of the specifics about Brown, but I think the wider point is worth discussing, especially with respect to the proposed change in legislation.


> Barrett Brown was not convicted merely for linking to data on the web.

From the article:

     Most of us expected that those charges would be dropped and some were, although they still influenced his sentence.
I want to be generous and say that the author meant what you said. The linking was not something Brown was charged with, but it was brought up during the sentencing and probably influenced the length of his prison sentence.

So while you're correct that Brown was not charged with linking to information, it's worth noting that this was still used against him anyway.

Also, people who think the linking to hacked data was the only thing that got him arrested are being disingenuous (or are simply ignorant).


I'm not seeing where the linking was used to enhance his accessory conviction. Is there a source for that?


It's interesting that you say his sentence was "unjust" given that you always seem to defend crazy sentences as "not being the real ones anyway".

Also those three sound like incredibly weak charges, and yet you somehow defend the prosecution over them.


Is it because I say his sentence was unjust given that me always seem to defend crazy sentences as not being the real ones anyway that you came to me?

Earlier you said I say his sentence was unjust given that me always seem to defend crazy sentences as not being the real ones anyway?

Maybe your life has something to do with this.


Fun!

    $ export LC_ALL='C'
    $ awk '{ print $2 }' 10-million-combos.txt | tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | head -n 20
    55893 123456
    20785 password
    13582 12345678
    13230 qwerty
    11696 123456789
    10938 12345
    6432 1234
    5682 111111
    4796 1234567
    4191 dragon
    3845 123123
    3734 baseball
    3664 abc123
    3655 football
    3330 monkey
    3206 letmein
    3136 shadow
    3126 master
    3050 696969
    3002 michael
Edit: I used Wordle[1] to make a wordcloud of the top 1000 passwords: http://i.imgur.com/FImcPiG.png

[1]: http://www.wordle.net


Cool! I found the usernames interesting as well, since not many studies have been done on them. "dragon" is both a common username and password! In reply to another child post: the enormous number of "michael" passwords probably has to do with the smaller, but still large, number of "michael" usernames.

I'd run some more commands, to find out how many "michael"s use "michael" as their password, but I've got to head out now. Would be interesting -- anybody up for it?

(Ooh -- you could even juxtapose the usernames against common American names by decade [1], and probably derive some data about the ages of these users as well!)

(Furthermore -- what if we started keeping track of most common passwords by decade? That could be super interesting! I wonder if it's changed much!)

  $ export LC_ALL='C'
  $ 0-million-combos.txt | tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | head -n 20 3044 infouniq -c | sort -nr | head -n 20
  2119 admin
  1323 michael
  1113 robert
  1095 2000
  1049 john
  1041 david
  967 null
  940 richard
  922 thomas
  901 chris
  866 mike
  843 steve
  832 dave
  816 daniel
  812 andrew
  797 george
  765 james
  735 mark
  730 dragon
1. http://www.ssa.gov/oact/babynames/decades/names1980s.html


For some reason I seem to be getting different values then you. However from what I got, there was only a single instance of a username 'michael' having a password 'michael'.

HOWEVER, of all of the people whose password is 'michael' 83 seem to CONTAIN the str 'michael'.

Of the set of usernames 'michael' there are 20 whose passwords contain the string 'michael'

Of the set of usernames containing the string 'michael' there are 276 passwords that contain the string 'michael'

I honestly expected much more.


In other words, supposing that this data is representative of most peoples' password practices, just trying these 20 passwords gives you a ~18% success rate for any username.

And... dragon. That's an unusual password to make the top-10 list. I think this might be a somewhat skewed sampling.


You forgot a zero:

   >>> (55893+20785+13582+13230+11696+10938+6432+5682+4796+4191+3845+3734+3664+3655+3330+3206+3136+3126+3050+3002) / 1e7
   0.0180973
That is, 1.8%. This is confirmed by http://maxmcd.com/passwords.html.


It makes equally little sense to me, but "dragon" is routinely high on top password lists.


I think it's probably just a common thought process. I'll pick an animal -> dragons are the coolest animal -> nobody will ever guess dragon, this is way better than using my dog's name.

Have you ever seen those online riddle things that say pick a color, pick a tool, wow I bet you picked a red hammer! We all grow in relatively similar societies, we all have relatively similar ways of thinking.


    > I'll pick an animal -> dragons are the coolest animal -> nobody will ever guess dragon, this is way better than using my dog's name.
I must confess, this is typically my exact thought process when crafting a password, a username, or even sometimes a nickname for people to call me in real life.


Better change your passwords, Dragon.


That many people have noted the "dragon" phenomenon as strange, but we don't yet have an explanation, is perhaps stranger yet. In early days, one could have hypothesized that some basic "how to use passwords" resource had offered "dragon" as an example of a password, but after two decades of internet it seems unlikely that something like that could have had such a large effect.


Part of it may be where the passwords are scraped from. If "dragon" has some relevance to the field then there's a higher probability that it will be used by people working in that field. This list is a sample of passwords from compromised databases not from all databases in the world.

I wonder about the prevalence of "allsop" as a password. I came across it in a computer I was repairing last week and it shows up 159 times in this list. Is it from the acronym SOP? Or because of the company that makes mouse pads?


My first thought was that there could be some connection with MMORPGs, which often feature dragons.

Or because of the company that makes mouse pads?

This is probably the case for "allsop" - there are people who will look around them for inspiration when coming up with a password, and what was written on their mousepad caught their attention.


I have a close friend whose password is dragon. At first, I thought it was a joke, but it's true.

Those of us who were kids in the 90's had dragons everywhere. Hell, we wore shirts with dragon patterns. Dragons were cool. Dragons were our passwords.


So is "jesus", and that doesn't seem to be true here. I find this list highly dubious, compared to others I've seen (and, long ago, obtained myself.)


And it has been for 20 years


Computers are magic. Dragons are magic. QED.

I'm actually kinda serious.

Also, humans are monkeys. Ergo, "monkey" is popular.


"humans are monkeys" - yeah, in the same way that unicycles are hovercrafts.


> supposing that this data is representative of most peoples' password practices

That might not be the case; not all passwords are created equal.

As an example, my password to some goofy online game that requires registration is nowhere near as strong as the password required to log into my work email account - for some things, I prioritize being able to type a password in quickly on a mobile device over the danger of someone breaking in and playing a low-scoring word in online scrabble.


It makes no sense to me, but I do recall a middle school phase where I used either "dragon" or "drag0n" for my passwords. I didn't particularly even like dragons and I don't recall ever hearing others use it, so it really catches me by surprise. Whenever I see it in a top passwords list I am filled with memories of after school library trips.



For sensitive sites, my preferred solution to this problem is to add a sequence of random characters to the User ID field. The user would then authenticate with something like this:

  User ID: John-CPE4E38J
  Password: snoopy
For extra security the code would then move the random characters to the password so the authentication library would see this:

  User ID: John
  Password: snoopy-CPE4E38J
In this way even an attacker who gains full access to the server database would be unable to read the passwords (assuming they have been hashed well).

Also, the User ID can be stored in a cookie so that the User ID field on screen is pre-populated and the user only has to type "John-CPE4E38J" when he switches to a new computer.

More details here: http://security.stackexchange.com/questions/80352/is-it-a-ba...


This is a horrible practice. You are trying to implement two factor auth, but with a static second factor that will not be considered private by most users. It is a huge burden on them to remember, and is providing you with dubious security at best, and actually providing a vector of attack at worst. Please don't do this.


Yes, it is two factor authentication with a static second factor that will not be considered private by most users. And yes, a 'real' two-factor authentication mechanism would provide better security.

Unfortunately, due to market competition many websites simply cannot require 'real' two-factor authentication for all users. Here are the steps I would need to provide to my father to register for a typical '30-day free trial':

  1) Go to website.com and click 'Register'
  2) Enter your email address
  3) Think of a password and type it 
  4) Click 'I agree'
  5) Click 'Register'
Here are the steps I would need to provide to my father to register on a website for a free trial with 2-factor authentication using the Google Authenticator app:

  1) Go to website.com and click 'Register'
  2) Enter your email address  
  3) Think of a password and type it 
  4) Click 'I agree'
  5) On your phone, press the 'Play Store' or 'App Store' icon
  6) Press the 'Search' icon and search for 'Google Authenticator'
  7) Press 'Install' and wait for it to install (if you have an iPhone the install button might look like a little cloud icon)
  8) Press 'Open' to open Google Authenticator
  9) Press the 'Menu' button which looks like three dots in the top-right corner of the phone screen
  10) Choose 'Scan with barcode'
  11) Point the phone at the computer screen as though you were going to take a photo of the barcode on screen. 
  12) Wait for the phone to register the barcode, then enter the number shown on your phone into the website form
  13) Click 'Register'
Even with all these steps laid out for him, my father would probably find it extremely frustrating to get to step 13.


You could do it for him. Google Authenticator is great. My bank uses 2FA but it's on some fiddly little calculator device that I never have with me.

Some sites (Coibase) do 2FA with text message which is also great.


> My bank uses 2Fa but it's on some fiddly little calculator device that I never have with me.

I left my bank for this very specific reason (HSBC Aust)

Grrr


Conversely I stay with my bank ( Nationwide ) because they use the device...


Your bank has not done this for your benefit and it hasn't done it in a way that benefits you. They've done it to pass on (to you) the liability for any fraudulent activity.

From http://www.cl.cam.ac.uk/~sjm217/papers/fc09optimised.pdf:

"We reverse engineered the UK variant of card readers and smart cards and here provide the first public description of the protocol. We found numerous weaknesses that are due to design errors such as reusing authentication tokens, overloading data semantics, and failing to ensure freshness of responses. The overall strategic error was excessive optimisation. There are also policy implications."

"The move from signature to PIN for authorising point-of-sale transactions shifted liability from banks to customers; CAP introduces the same problem for online banking. It may also expose customers to physical harm."

Meanwhile, I switched to a bank that uses SMS as a second factor and only where it's necessary: I don't need to use an inconvenient calculator.


Are you generating the User ID with the additional characters and expecting the user to remember/keep track of it? I do think that is very user-friendly, even with the cookie trick you describe.

It seems like you are trying to force your user to remember a salt. Why not just use a proper salt and a strong password hashing function?

Also note that this protection is only useful in the case where an attacker can get a database dump but cannot perform an active attack on the server.

On the other hand, I have seen some sites (gandi.net comes to mind) do something similar to this. Wonder if they have a similar security reasoning?


> It seems like you are trying to force your user to remember a salt.

Yes, essentially I'm trying to force the user to remember a client-side 'salt'.

> Why not just use a proper salt and a strong password hashing function?

Because it wouldn't protect against the attack described by userbinator (ie. 'just trying these 20 passwords gives you a ~18% success rate for any username'). Having a client-side 'salt' gives you that protection.

> I do [not] think that is very user-friendly, even with the cookie trick you describe.

Yes, this system imposes a cost in terms of user-friendliness. But for sensitive sites (eg. medical or financial) I think it's worth it.


Sensitive sites should use 2-factor authentication by default as your method won't help against keyloggers and other malware. I don't like 2-factor authentication (it's more time consuming and costly to get a throw away phone number than a new single purpose email address to register to a random site), but this method is even less user friendly as you can't expect an average user to remember a random symbol string in few months. What would really improve security situation is a good, easy to use, cross platform, cross device password manager that would be included in major browsers by default.


From a user experience standpoint, this is a bit of a nuisance. Users are already having real difficulty remembering all of their different usernames and passwords for different things. A password manager is still an alien thing to a lot of people. A lot of people still have a little text file somewhere, or they rely on messages stored deep in their mailboxes somewhere, or they have a little piece of paper they try desperately not to lose...

You're right that their browser auto-complete will usually take care of it, but once it doesn't (because they switched browsers, because they got a new computer, because it got infected with malware and they took it to a wipe-and-reinstall shop), I'd expect a significant number of your users to fall back to just doing a password reset, which is a hassle.

From a security standpoint, I'm not sure what problem you're trying to solve. I get that you want to strengthen your users' passwords, but what is the specific scenario you're imagining where this is the best prevention? If you're concerned about someone brute-forcing user accounts from the outside, just make sure you have some sane throttling code. If you're concerned about someone stealing your database and breaking user passwords, just make sure you're using a robust password storage mechanism (blah blah bcrypt scrypt etc. etc.) and the usual other internet-facing application best practices (parameterized queries for example). If you're still feeling paranoid about that situation, then probably your server code could add some value to each password without doing any harm, I dunno. If someone gets sufficient access to your server to get your database and your code, game's over anyway. If you're concerned about your user having their credentials compromised elsewhere and that being used to access their account, do the same thing that many banks, Linode, and other services do: maintain IP white, grey, and black lists, and send a challenge/response to the user by text or email if the IP is on a grey list (in addition to checking for their login cookie first).

Your approach is different, but I don't understand it yet. :-)


Yes, I agree it would a bit of a nuisance for users to perform the 'reset password' after they switch computers, re-install the operating system, etc.

And yes, I'm trying to solve the two problems you mentioned: (a) someone brute-forcing user accounts from the outside; and (b) someone gaining access to the server database and thereby gaining access to other sites where the user has the same credentials. If it is true that "just trying these 20 passwords gives you a ~18% success rate for any username" then it seems to me that throttling brute-force attempts would not be very effective.


Is this materially different from requiring the user to have some random characters in the password, but for some reason making them type these characters into the username field where it'll be cached by the browser's autocomplete feature?

It seems like this is an amusing enough hack to do on non-sensitive sites, but I wouldn't do this on anything "real". When it comes to authentication, "hey I had this really neat idea" is almost always an immediate precursor to making things worse.


If the random characters are stored in the User ID field then 95% of the time the user just has to remember their password. It is only when the user switches to a new computer that they would need to type in the random characters. Wouldn't that be a significant benefit over having to type the random characters every time the user logs in?

I agree with your observation that "hey I had this really neat idea" is almost always an immediate precursor to making things worse. Almost.


I'm surprised (disappointed?) only 1 person used "correcthorsebatterystaple".


That is terrible, he/she used the same phrase as in the example!


Don't read too much into this. My main email account is in the original list that was posted in October of 2014. My account that is listed is myname@gmail.com. The password though is not the password to myname@gmail.com but rather to my "junk" site password.

For almost any site I have an account, I use a strong, unique password. For sites that I don't care about at all AND that I suspect have security problems I use a standard common insecure password. It is that common insecure password that is paired with my gmail account.


Looks like if you know someone called Michael, chances are that you need to talk to him and his loved ones about password hygiene...


My name isn't Michael, but I use the password 'michael' all the time.

Edit: oh, crud


  10938 12345
That's the same combination I have on my luggage!


Heh. I use 'password' for when I'm purposely trying to make things unsecure (Like being nice and sharing my unlimited data via my phone's wifi hotspot on public transport).


So this dataset seems to be limited to english speaking qwerty using users, i.e. US only I guess?


Cool! My password hunter2 wasn't at the top of the list!


here is the top 48K for lazy ones http://ix.io/ggh


I don't understand exactly why it's necessary to release usernames along with the passwords, or why it's ethical to do so. Stripping the domain portion of email addresses does absolutely nothing when you can find the real email, and other accounts of the victim, by Googling the unique part of the email address.

How does tying each password to its corresponding username help with password research, and does the value gained outweigh the cost of someone using this list for malicious purposes?

I'm not saying this should be illegal, but I'm struggling to understand the intent here.


What about research to determine to what extent usernames with words in a certain language will tend to use passwords with words for the same language? (More generally, is there any connection between the bi- or trigram distribution on usernames and the one on passwords? In fact, do they just look the same, or could you tell given a string whether it's more likely a username or a password?)

Do usernames of people with weaker passwords have something in common? How do they differ from people with stronger passwords? In France there is a practice of picking names like "foobar42" or "foobardu42", where "foobar" is a first name and 42 a "département" (country subdivision) number, which I would associate to casual users. Here I could quantify whether people with usernames of this form tend to pick weaker passwords. Insert your favorite prejudice here about lame and skilled username patterns, and quantify how the password diversity of this group fares in comparison with others.

Is it true that the most common passwords were associated to usernames that were also common? Does username frequency correlate with password frequency? Are there more people with unique usernames or people with unique passwords?

In some countries it is customary to annotate usernames with the user's year of birth. Filtering on such usernames could give insight about the correlation between age and password quality, or identify which passwords are more or less popular given the user age. You could try to check correctness of the filter using the fact that some of those people may have used their birthdate (including the year) as a password.

If a seemingly rare password in the dataset only occurs for two distinct user names, then maybe those two user names actually correspond to the same user. Do such usernames have a low edit distance? Could you use this to learn general rules to determine, given two usernames, whether they seem to correspond to the same person?

I just gave those off the top of my head, and I'm not at all working in this field, but I'd have no trouble imagining interesting applications for this data that would not have been possible with the passwords alone.


I feel like most of those research questions could be answered if it was a "username -> password strength" mapping, in addition to a hash to study duplicate trends, rather than just "username -> password". Obviously there is no objective ranking of "password strength", but a decent approximation could be provided.

There are serious risks to having your username and password in a public list. Yes, all of these usernames and passwords were already technically publicly released, but to a lazy and ignorant script kiddie, finding or even being aware of those lists can be outside their grasp.

By aggregating everything into one list, you 1) increase the search engine visibility for all credentials, which means someone Googling the username of, say, an Internet commenter who pissed them off may find a plaintext password they could use to impact the person's life with much higher probability (I work in information security and have seen that happen on many occasions), 2) encourage script kiddies and fraudsters to spend time working through the list to find working accounts that other criminals have missed in the past decade, and 3) undo any work that paste sites like Pastebin and file sharing sites like Mediafire have done to remove copies of the database dumps. 1) may not apply if it strictly remains a torrent, but it'll probably be floating around public paste sites within a few days, which would likely mean search engine visibility for every username on it.

If even 0.01% of the users on this list have accounts compromised due to its release, then I don't think that cost justifies the research benefits relative to a more redacted version of the list.


> I feel like most of those research questions could be answered if

If the person who releases this kind of information has the foresight to know what the questions are going to be, they could provide the answers directly rather than go half-way and modify the data. It would likely be less work than trying to produce anonymized data that is both useful and secure.

What I see used in cases like this is one of two options. Either full public access, or restricted access where only a few selected get the chance to do the research. The 0.01% misuse is thus balanced to that choice, rather than the theoretical case of anonymized data.


As I explained in the article I seriously doubt that any more than a tiny number of these passwords are still valid. And there is no reason for them to be, having already been widely available, indexed (and cached) by every search engine, archived at archive.org, and downloaded by thousands or tens of thousands of people. Anyone who would use this data maliciously probably already has it.

Much of this data is the same data monitored by sites like haveibeenpwned.com and a dozen others. Facebook scrapes these. Lastpass will send you alerts. The risk here is minimal; the research value is much more than you realize.


>Anyone who would use this data maliciously probably already has it.

You might be surprised. The fact that these dumps are supposedly quite old certainly mitigates the risk, but I've seen cases of primary email accounts being taken over from a plaintext password in a dump 5+ years old. No one ever tried it on the email because it wasn't in the dump and wasn't identical to the username, though it was very close.

Aggregators like haveibeenpwned.com and Lastpass responsibly use the passwords they scrape, they don't release them all in a big batch like this. Many cybercriminals do the same kind of scraping and share these aggregated lists privately, but they're always going to be missing things, so there's no question they're all going to be pulling in your list, too. And odds are there's going to be at least one dump that a lot of them missed which yours has.

I do understand there is some research benefit here, but even in the best possible scenario I don't think the value from the research outweighs the costs.


First of all, a good number of these passwords were simply gathered through google. Some were gathered via the archive.org archive of pastebin pastes and their normal web page archive. Some were from forums that were located via google. This data is already out there, being aggregated doesn't make it any easier to hack these people.

Try searching for "Cucum01:Ber02" or "shawman:badman" and you will see how many passwords are indexed. I have hundreds of searches like these that I monitor and scrape.

Second, I regularly share my data with the owners of password checking sites such as haveibeenpwned to make sure users are able to be aware of these breaches. Releasing this data isn't something I have taken lightly, I debated it for years. I have weighed the risks and felt it was important to release the raw data, although not everyone will agree with me on this. I made a good effort to minimize the risks to actual users.

Finally, keep in mind that most users are already at risk simply because they have bad passwords. Ten percent of users have a password on the top 1000 list. A large percentage of users are at risk because the websites they are on don't have proper security. This is how people get hacked, not because of a password found on this list.


Still, the whole purpose of a password is to remain secret. He's certainly doing these users a disservice by releasing this list regardless of the hypothetical likelihood of the data already being available. Basically the arguments for doing this all seem to boil down to "they should already know their passwords are compromised" which nobody can guarantee is the case.

I agree that having a crappy password puts you at risk, but what about the people who genuinely tried to use some common sense but are on this list anyway? Is it their fault for not religiously keeping up with the latest indexed password lists?


OK, I'll bite: can you give us some ideas on how this would lead to a genuine advancement in user authentication (that we wouldn't have with username/pw de-linked)?


Example:

Username: mickael

Password: mickael69

EDIT: Just to be more precise, there is a correlation here, and with so much data a lot can be known. Patterns can then be forbidden from password fields so the website is less prone to dictionary attacks.


So what would you do here? Disallow "mickael" from the password? That's pretty user-hostile and almost completely pointless.


Is it pointless to reduce the attack vector against your website? And, no, for a banking system, it is not that user-hostile to say things like "we have found that using <pattern> in your password makes it easy for people to guess, please choose a more complicated password".


All possibly interesting questions (certainly not to me) but I fail to see how they would lead to any genuine advancements in authentication.


A list of 10 million passwords alone answers almost no questions. In fact, it's probably possible to programmatically predict, with a depressing level of accuracy, what a great deal of such a list will look like, given the already available research about the distribution of complexity, the parts of speech and numbers commonly used and in what patterns, etc.

So, the next interesting question is: given the already plaintext-available lists of usernames and passwords, just how much coverage is there in the known space? Are your passwords known? Are your users' and clients' passwords known?

This document is perfect for a true positive on the matter of needing to deprecate particular combinations of username and password, and, as an obvious corollary, presenting evidence for consultation advice about the same. (Of course, being only a sample, it doesn't say anything about a true negative.)


Before I go into the research aspect of it, there is no reason to hide the usernames from the passwords. They are already out there. The bad guys have them. So why not release them so that every one can look at them?

Also I am sure there are some research aspects to the usernames. At the very least behavioral deductions that can be drawn based on these combinations.


Probably to find out how many people do stuff like type their username backwards as a password/what kind of patterns they use. If that is useful enough information to warrant publishing data like this is debatable, yes.


Also interesting, how features of a username might correlate with password strength. Who do you think uses a stronger password, someone with the username "carguy551978" or someone with the username "w1ntermute"?


carguy followed by the 24'th n such that 1 + n + n^13 is prime, followed by the 34'th such n? I would expect a very, very strong password from someone who picks their username like that.

(see https://oeis.org/search?q=__%2C+551%2C+__%2C+978&sort=&langu...)



I dunno if he should have said "released", because he's not releasing any new data. Everything he's posted is already available to anyone with a search engine and a bit of curiosity.

So if you're concerned that information which wasn't previously public is now public, you can be at ease -- all of this data was not only public already, but less "cleaned up".


I'm curios to see if any of my accounts/passwords have been compromised


Wouldn't be surprised if one of these sites already has it

https://breachalarm.com/ https://haveibeenpwned.com/

The author does not seem like the type of person who did the hacking himself to obtain these, but rather curated leaks into his database


exactly why I'm curios. haveibeenpawned listed a username I often use as being pwned in a "battlefield heroes" leak, but I couldn't find the "release" for it.


> I'm struggling to understand the intent here.

A desire for a particular type of attention his ego seems to need.

Which, combined with either a moronic lack of appreciation for the hassle and damage he's going to cause to end-users who've already been hosed once before, or an arrogance that makes him not care, makes him difficult to fit for a white hat.

FTA:

> This is completely absurd that I have to write an entire article justifying the release of this data out of fear of prosecution

What's absurd is his assumption that stripping domain names is somehow sufficient.

Edit: I'm getting downvoted like crazy here. Which is fine, but people seem to think it's ad hominem because I'm narrowing the reasons behind why someone would release a data set with a considerable price of collateral damage attached to it, while doing very little to mitigate that damage.

Just because the likely options for why someone would do such a thing don't speak favorably of the person, doesn't make it ad hominem. An ad hominem attack is seeking to undermine someone's argument by attacking their character.

I'm saying Mark Burnett made it difficult to assume good things about him after a stunt like that. If he actually made a real argument that what he did was sufficient, or that the harm he's going to cause is more than offset by the greater good it'll do (or some such argument), then we'd have something to try to undermine (whether legitimately or fallaciously), but as it stands, he hasn't even justified his actions.


>Ad hominem + ad hominem

Research requires data. If I want to do research on how best to implement my bank system, I would like to know what passwords are more likely to be contained in a dictionary attack. Usernames may have a high correlation with passwords and thus are useful. Considering all of these passwords can be obtained from obscure forums/websites and that the website where the IDs are used are not specified, I don't see why he could not release it to the public for researchers to use.


> Research requires data.

There's a lot of research that could be performed if we were willing to generate data without due regard for the inherent downsides.

Saying research requires data is just insufficient justification in this case.

> I don't see why he could not release it to the public for researchers to use.

Because the collateral damage doesn't justify it. That aspect of it seems to be little more than a side note to him.

He could quietly and securely give the data to established researchers.

Or, he could very publicly release a torrent for everyone's use, with almost no concern for how it'll be used.

There's a massive difference there and the likely potential reasons behind his decision to do the latter leave very little room for one to make favorable judgements about either his motives, or his ability to responsibly mitigating risk.

I'm sorry if you believe any of that to be ad hominem, but it just isn't.

> Usernames may have a high correlation with passwords and thus are useful.

And that's precisely why the likelihood of collateral damage stemming directly from his actions is much higher than it should reasonably be in this instance.

At some point what you're giving up to further research isn't worth the tradeoff. He's selling innocent bystanders up the river to further his own cause, with little evidence that he's done everything possible to limit collateral damage.

I don't understand why this line of thinking is a hard sell here.

When a government or corporation releases lightly-redacted, personally-identifying information about people, the outcry is (rightly) massive. White knight does it and, well, to question his motives is ad hominem?

Really?


> A desire for a particular type of attention his ego seems to need.

> moronic lack of appreciation

> or an arrogance

This is ad hominem.

Here's a reference: http://en.wikipedia.org/wiki/Ad_hominem


Sorry, nope. I'd have to be attacking the character of the person making the argument, and do so in an attempt to undermine their argument, for it to be ad hominem.

I'm questioning the motives of someone who just released a data set that's going to cause very real harm to very real people, who've done nothing to deserve it.

For the record, given his credentials, it's highly unlikely that he didn't fully appreciate the ramifications of his actions. Which narrows down the other options on the table. (Did I mention he's selling books?)

Just because I'm not blowing sunshine at the guy, doesn't make it ad hominem.


Yeah, I wish people would quit using "ad hominem", it's turning into a tell for "people who spend too much time online and still don't know how to disagree".

Still, I think you're really overstating the risk here. The data set doesn't have email addresses and it doesn't list the specific services involved. How would you propose causing real harm to these real people using the data here, in a way that hasn't already been done or tried?

It sounds like he did put a lot of thought in to his decision. You seem to be arguing that he thought about it, and then decided to do it anyway to help his book sales, which would make him a pretty indecent person. Do you really want your opinion to boil down to, "I think this guy is greedy and bad"?

As far as the value of research goes ... well, we don't really know yet. This particular dump, yeah, probably won't add much value to the current body of research. (I personally have much larger dumps, and don't consider myself a researcher ... so it's not like there's a shortage of data available.)

That's the thing about research though. You start off by investigating something and seeing where it leads. Maybe this will be the dump that would encourage developers to start maintaining password blacklists ("Please do not use this password, it is too common"), that would be valuable. Maybe this will just be another straw on the camel's back that eventually leads to everybody giving up on the idea of passwords entirely.

Who knows? It might be valuable, it might not, but it's not dangerous.


Do think it might cause harm if the domain names were retained?


I'm not sure.

Given what the author says about the data (it's all gathered from public sources, a lot of it is very old), it shouldn't matter whether the domain names or service names were there or not.

But then the data would go from being mostly anonymous to somewhat personal, and I couldn't defend that as much. Practically speaking, the risk of harm should still be really really low, but it just seems like a bad practice to distribute information that might be used to identify someone that's had their password leaked somewhere.


There is an annual 'Passwords' conference [1], which I attended in 2012, and was blown away by quite how much researchers are able to do with these password lists.

Unfortunately, I was equally impressed with what attackers are able to do with them as well. An important point is that attackers tend to have better lists, because they are the ones stealing and cracking them, and these lists make them increasingly better at cracking passwords. Defenders use the lists for all sorts of analysis on how exactly users pick passwords.

For example, "complex password policies" have become increasingly popular. But do they actually increase the entropy of the chosen passwords? Surprisingly little, since users will "defeat" the policy by applying easy to guess "munging rules". Humans being human and such. The thieves have the lists, and learn to apply the munging rules and defeat the policies. Researchers need these lists so they can discover the same weakness and try to react.

More recent research looks at things like how effective the password strength indicators are at actually helping users choose stronger passwords. We also learn about how users choose different strength passwords based on the sites they visit and such. This is absolutely fertile ground for research which can improve how we perform authentication.

Yet another good use of the lists is in defending against online attacks. E.g. Failed attempts that follow the general probability distribution of the lists are easier to identify as bots.

[1] - I think all the talks are posted, although I'm not sure there's a central archive, each conference is identified as Passwords^[Year], e.g. Passwords^14 https://passwordscon.org/


These lists were released by attackers in the first place. Attackers are always going to have the lists, and the only choice defenders can take is whether to use and distribute to the defender community, or not.


I'd be curious at what researchers were able to do with such a list (genuine, practical advances). It doesn't strike as particularly useful.



Forgive me for doing so, but allow me to ask some possibly ignorant questions and perhaps play the devil's advocate for a moment. What about this release will help? What are the compelling research problems in the space?

We know users pick bad passwords. It seems to me the most compelling "problem" is hardly a research question -- isn't it about finding ways to encourage users pick strong passwords, not share them between sites, and not put them on sticky notes on their monitors.

Ok, putting my charitable hat again... My best guess is that researchers would like some idea about how long it takes to crack some percentage of accounts; e.g. with rainbow tables or other techniques?

The author mentioned "Analysis of usernames with passwords is an area that has been greatly neglected and can provide as much insight as studying passwords alone." What directions might a researcher take this?


The main reason I have always included usernames and passwords in my research is because it allows me to analyze frequency data across multiple sites. Although I could have anonymized the usernames, I thought it would be best to keep them in. There is good value there. For example, there is quite a bit of overlap between usernames and passwords. Also, how many users include all or part of their usernames in their passwords. Plus, what usernames might hackers be most likely to try out?

The main goal here is to put the data out there and let other researchers find the value in it.


So how would you utilize such knowledge in the real world?


You could use it to create a password strength meter for your website, and enforce a certain strength.

Let's say it is common to include a subset of the username in passwords. Doing so would decrease the password strength and be disallowed.

Also, you could look at certain usernames and compute likelihood of certain dictionary words, and disallow them. For example, a user named Bob might be unlikely to use spanish words in a password, but a user named Jose might be more likely.

Being aware of methods/info used by crackers when designing secure systems will lead to stronger systems.


I just can't see how any of that is realistic or very useful. More energy needs to be spent on preventing breaches, not silly password requirements.


> More energy needs to be spent on preventing breaches

Hard to argue against that.

> not silly password requirements

You don't think that password requirements help prevent breaches?

Try this: hook up a server to the internet that's open to ssh. If you look at the ssh login attempt logs, you'll notice that you constantly have people banging against it, trying to log in as root. Yes, password requirements are a small part of overall security, but they are very helpful.


Brute force attacks are too easy to mitigate. I'd like to see the energy go to defaulting against brute-force attacks.


The main issue is that attackers already have this data. They have a giant head start when when guessing passwords because just by looking at the username they can vastly reduce the search space. Whitehats and the public need to know how blackhats are reducing that search space. By making good faith publication and research on passwords risky (legally unattractive) we actively weaken security. I find it amusing that people find sharing password/username pairs questionable yet we don't seem to hold companies accountable when they loose millions of the things at once. Talk about a double standard. (RE: companies have lawyers and the little guy can get fucked for all anyone cares)


When I first got on the Internet in 1994 I used the same password for everything for the next decade before I became security conscious (now I have a random, strong, unique password for every service).

Anyways, that password is not in this list. I have found it in other password dumps before. So, I don't know what to think.


This isn't a comprehensive list of all leaked passwords. It's a random subset of 10 million for research purposes.


I don't think it is necessary to have one password for every single system, but three or fours tiers of passwords.

And just keep in mind that there's one password to "rule them all". That is the password for the primary mail account. I use 2-factor authentication for that.


> three or fours tiers of passwords

Can you elaborate? My first thought is tiered by category of the service. No, I don't want my financial institutions to all have the same password, even if it's from the most secure tier.


Sites require you to sign up but it won't matter much if someone gains access to your account on them. Those might as well share a password. Same with sites that share trust buckets like [goodreads, yelp], [facebook, twitter] etc.

In the real world though just memorize separate bank and email passes and use a password manager w/generated passwords for everything else.


This is 10 million out of 1 billion that he has.

So there is only a 1% chance of a leaked account getting in this list.


From the law quoted in the article, wouldn't it be illegal to simply make a course about computer security?

The teacher willfully (and knowingly) teaches the student about "possible means of access to a protected computer."

Note: According to http://www.law.cornell.edu/uscode/text/18/1029 teaching is defined as trafficking information ("the term “traffic” means transfer, or otherwise dispose of, to another, or obtain control of with intent to transfer or dispose of; ")


Even if this release has no implications for security, I think it may raise legitimate concerns for users' privacy. No doubt most users expect that their passwords will be known only to themselves. Many of the usernames contain real names, and many more could probably be traced to them. Ian Watkins was found to have "gloated" about his crimes in his password. With time and attention, I wonder whether such "dark secrets" could be found in this list.


Went ahead and performed a Levenshtein distance analysis from this list, and made a graph of it. Number 8 seems to be the sweet 'secure' spot that most people latch onto, though the distribution curve is interesting - or very human-like: http://pp19dd.com/2015/02/levenshtein-distance-10-million-us...


How are things like Twitter accounts hacked? Are they generally brute-forced with a list like this, or how do so many of them get compromised?


For the lazy:

  grep -i <password> 10-million-combos.txt


for the paranoïd lazy

    export HISTCONTROL=ignorespace
     grep -i <password> 10-million-combos.txt
(type a space before the command for it not to be logged in the history)


Just because you're paranoid doesn't mean the eyes above your ï aren't watching. Although, it might mean you're delusional.


And then history -c


... which will clear your entire history, which you probably don't want.

I don't know a shorter way, but to delete one line from history, do 'history', which shows the line numbers, then 'history -d LINE_NUM'.

Or, in bash, prepend the command with a space and it won't go into history.


Open new terminal -> unset HISTFILE -> do your greping -> close terminal


Unless somebody did a ps while your grep was running...

Don't put sensitive stuff in CLI args!


Good point, how about

    grep -f - 10-million-combos.txt
    <password>
    ^D^D


Depending on your system and configuration, couldn't you prepend a space to the command to prevent it from being saved into your history?

edit: Looks like vacri mentioned this in a peer comment an hour ago. Whoops!


That works if you are using bash, but if you are, for example, using zsh, you would first have to run "setopt histignorespace" which would enable hiding lines prepended with a space in the history (it's off by default).


For the lazier, -i means case insensitive.


This is great, but if you use a password manager, it's very difficult to determine which, if any, of your accounts would be compromised. For myself, this would just be doing a dump and looping a few greps. But for family and friends, does anyone have any ideas for a less technical audience?


If you're using a password manager and thus -- I hope -- using a different password for every service, it doesn't really matter if one service gets compromised. The compromised service in question will (hopefully) force password resets for all affected users, and the compromised password is useless elsewhere.


Instead of responding to breaches, I would recommend an annual (more frequent is better, obviously, but I think annual is fine) cycle of rotating passwords. Just pick a day and spend it replacing passwords. As a side effect, you get a mental update on exactly what identities you're managing and whether or not you want to modify or close them.

This should be fairly straightforward even for non-technical people, if they've got a grasp on actually using the password manager itself. The hard part is (1) getting the list of identities, which isn't too hard if you're hand-holding, and (2) actually remembering to do it. (Which is why annual is nice. You can peg it to a holiday you already celebrate, or substitute it for one you don't. Halloween, for instance, because breaches are scary? Or something.)

Bonus: if a breach happens that actually feels scary, just do the rotation ritual ahead of time. Not that big of a deal.


1password has a limited ability to warn you of compromised passwords. they maintain a database of breaches that they warn you about in their client. the warning, however, is much less prominent than it probably should be


http://security.stackexchange.com/questions/46625/is-it-lega...

I thought of exactly the same. I was motivated by the password strength meter out there. How can you actually tell a password is strong or not or whether a password is known to attacker or not if you can ask (I was thinking along the line of private information retrieval) privately and get a probability rather than a yes/no based on all the known stolen credential out in the Internet (there are many Gbs files you can download)...


Just a thought here. As far as I can tell, many bona fide security researchers seem to be independent consultants. Would they be less at risk of prosecution if they were handling sensitive data such as user names and passwords under the coverage of universities and/or similar accredited institutions operating under protocols as to who can and cannot access the data?

It would probably be more security theatre than actual security, but I'd imagine that it would at least keep the FBI happy.


I wish there was an origin with these. A username/password combo I use on a ton of sites I don't care about is on here. It would be nice to know which is one leaked it.


What sorts of analyses are you guys planning? Maybe: -clustering of passwords. are aspects of the username biased towards certain clusters? -distribution of alphanumeric characters at each position of a password (e.g. 1 is a disproportionately common final character) -differences in password strength between usernames with male and female names


Man, I hope my password isn't in there.


What's your password? I could check the file to see if it's there. I found one of mine. Does anybody know from where these passwords are from?


hunter2


For those not familiar: http://bash.org/?244321


Oh, the memories!


Public dumps mostly from the last 5 years, but some as old as ten years


>What's your password?


Actually three of my own passwords are on there, I left them in


Read the actual article. None of this data is new:

All data currently is or was at one time generally available to anyone and discoverable via search engines in a plaintext


My thoughts exactly. I'm amazed that I can download the file, but at least I get to see if any of my passwords are there.


I'm not!

#successkid


To save a moment of time, here's a quick check that won't save the password string to your command history:

read -e -s -p "Password: " password && grep -i $password 10-million-combos.txt | wc -l && password=""


Woah you are REALLY optimistic about law enforcement agencies wanting to focus on real criminals.

But Barrett Brown is not the first or only example.

Aaron Swartz is the only example I need to understand what to expect from the various US law enforcement agencies.


Barrett Brown intentionally did everything in his power (including, but not limited to publicly threatening named FBI agents and their families) to get targeted by LE, and succeeded.

Swartz? Swartz knowingly did several obviously illegal things (breaking-and-entering?) and then acted shocked when he got charged.

His actions may have been morally defensible, but not legally. Law enforcement did their job there.


it's kind of hilarious that it takes a case as transparently self-serving as aaron swartz to calcify a population as privileged and inured to the justice system as programmers to go "woah hey this shit might be kind of fucked up!!!"


Aaron S. was not at all the first time programmers recognized the problem and acted on it. Perhaps it was the first time that you became aware of the issue. But there were large-scale campaigns as far back as Robert Morris's worm in 1988. Even then, programmers were rightly concerned with unfair punishment for hacking and were outspoken about their concern. Similarly, with the Randal Schwartz in 1995, and many times since.


Could someone describe the dataset for me? Is it just two columns with one for usernames and another for passwords? Or is there any other info included? I'm on mobile right now or else I'd grab it myself.


The first column is username, followed by a tab, followed by the password.


Know what encoding it's in? Postgres is choking on UTF8 and Latin1


Are you sure it's not choking just because it's postgres?


yep, just 2 columns


Way to go buddy! This research is indeed necessary and releasing such a dataset will be beneficial. Maybe it will also bring light to how outdated password based authentication really is.


If anyone wants to check their username, I have a searchable DB up now. https://levlaz.org/passwords


> He was close to Anonymous and was in fact their spokesman.

Err, no he wasn't. He just managed to get a modest amount of attention.


Is there an http download link that would allow downloading from the browser (or with curl)?


There are services that will download a torrent for you. This one worked for me without registration http://www.direct-torrents.com/


It seems very useful for research and also practical uses, like how about a REST API with this dump? get <password> will not only return true if it exists but how common and how weak it is, or will return a false for unique. Is there such a service out there?


This seems a bit like testing if your parachute was packed properly by deploying it. Once I've sent my password at a 3rd party API, it doesn't much matter what the API says: my password is no longer secure.


Correct, but every site where you signup does that and I do not think anyone cares. Maybe such API will not be for end users but for other apps to run signup forms against it and help users choose a better one. In any case, the whole password deal is broken. I now use my own offline pwd generator for the "important" sites but I guess I am not the average Internet user.


What site out there is sending my plaintext passwords to a 3rd party service to validate their strength?


Hopefully none, and hopefully they are all following best practices to protect your password, but you trust them regardless. Besides, who said plain text, such service could use ssl.


I think he meant plaintext as opposed to a hash of the password.


With all due respect, I think this is a horrible idea. Isn't it just better to simply download the dump and filter the information with the command line? Why would someone even want to write a program that connects to an API to get info like this? You don't really need to know too much to be able to filter values like those, and it's way more flexible.


How is that even slightly useful?


Nice idea. Working on a simple Rails API now that will return a JSON response. Will take a while to import all the passwords though.

Currently got it returning this JSON: {"found":true,"password":"test","count":117}


Go make it! :)


Hahaha sure if I did not had enough side projects of the side projects of the side projects ;)


I'm on it.


>Many companies, such as Facebook, also monitor public data dumps to identify user accounts in their user base that may have been compromised and proactively notify users.

That is smart!


Is your password and username in that list?


Which one? I'm sure most of "hackers" here use different credentials for different purposes. Haven't found any of mine there though.


Yes, three of my accounts are on the list.


I could be relieved that my favourite password isn't in there but it's already been leaked by stupid, stupid engineers working for Riot (League of Legends video game) who stored it in plaintext and a hacker got it. It is a good practice to regularly change passwords anyways: If you're worried that your password is in there, you're doing it wrong in the first place.


You're doing it wrong if you have a favorite password. Use a password manager; there are more then a handful out there that are multiplatform and easy to set up. If that isn't your thing then there are plenty of techniques for generating unique, easy to remember passwords.


> As a final note, be aware that if your password is not on this list that means nothing. This is a random sampling of thousands of dumps consisting of upwards to a billion passwords. Please see the links in the article for a more thorough check to see if your password has been leaked. Or you could just google it.


I thank the post author for releasing this data. I found one of my accounts there and changed password to a more secure one.


awk '{ print $2 }' 10-million-combos.txt | grep 1234 | wc -l

only 180896 people have 1234 in their password, thought there would be more


Hunter6 is used as a password 9 times...


So this guy found a zero-day that works across different unzip binaries, or what ...!?


_ everyone frantically searches for their own usernames _


Everyone knows the whole email/password concept is broken. I believe that overall OAUTH is needed, but it needs a much stronger consumer facing view.


I'm not sure how OAuth can help. Does it allow you to choose whom to authenticate with, or does it tie you to one specific provider? I much prefer Persona, but Mozilla has abandoned it, and most resources around it are dead links. What a colossal shame.


I'm personally looking forward to something like SQRL.

https://www.grc.com/sqrl/sqrl.htm


That's also a nice protocol, but I think it requires too many extra things (mobile phone, net connection, etc). Plus, what if your key gets stolen?


It doesn't require a mobile phone. A client on your desktop can handle the authentication.

There's also a mechanism[1] to change your master key should it become compromised. Looks like a huge drawback is that it requires you to store an offline "Identity Unlock Key" somewhere.

[1] https://www.grc.com/sqrl/idlock.htm


A well-implemented OAuth implementation is wonderful. Sadly, many implementations are just crappy.


What's worse than crappy implementations is that every provider has their own version of implementation-specific crappiness that is inconsistent with everyone else's.


It was a mistake to release this today.

Everyone knows that legally questionable moves should always be made on a friday. That allows everyone in government to cool down for a couple days. By the time the weekend is over all the news outlets have moved on to whatever war just started up. You don't want some hothead prosecutor tweeting out a threat, forcing himself to follow through later in the week. Nobody picks a fight when 15 minutes away from a weekend.

Watch the NSA/CIA/MIB admissions. They always stage their spying/torturing me culpas on friday afternoons.


In the show West Wing they called it taking out the trash.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: