0. This is a file of SHA1 hashes of short strings (i.e. passwords).
1. There are 3,521,180 hashes that begin with 00000. I believe that these represent hashes that the hackers have already broken and they have marked them with 00000 to indicate that fact.
Evidence for this is that the SHA1 hash of 'password' does not appear in the list, but the same hash with the first five characters set to 0 is.
5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8 is not present
000001e4c9b93f3f0682250b6cf8331b7ee68fd8 is present
Same story for 'secret':
e5e9fa1ba31ecd1ae84f75caaa474f3a663f05f4 is not present
00000a1ba31ecd1ae84f75caaa474f3a663f05f4 is present
And for 'linkedin':
7728240c80b6bfd450849405e8500d6d207783b6 is not present
0000040c80b6bfd450849405e8500d6d207783b6 is present
2. There are 2,936,840 hashes that do not start with 00000 that can be attacked with JtR.
3. The implication of #1 is that if checking for your password and you have a simple password then you need to check for the truncated hash.
4. This may well actually be from LinkedIn. Using the partial hashes (above) I find the hashes for passwords linkedin, LinkedIn, L1nked1n, l1nked1n, L1nk3d1n, l1nk3d1n, linkedinsecret, linkedinpassword, ...
5. The file does not contain duplicates. LinkedIn claims a user base of 161m. This file contains 6.4m unique password hashes. That's 25 users per hash. Given the large amount of password reuse and poor password choices it is not improbable that this is the complete password file. Evidence against that thesis is that password of one person that I've asked is not in the list.
Prefix the whole command with a space to avoid dumping your password into your bash history:
" grep `echo -n yourpassword | shasum | cut -c6-40` SHA1.txt"
Ctrl+r history search? I'd tend to maintaining a complete history log so that when I've forgotten the one liner I used to rotate my videos 2 years ago I can easily recall it.
I thought 16k entries might be reasonable but that doesn't even last 3 weeks for me. I think there might have been some issue with slow disk seeks so at some point I restricted it to that many.
I guess it probably it would be better to regularly backup the history file to deal with possible some accidental truncations and issues when running multiple shells concurrently, but probably the overall effort to set up such a system would outweight the benefits.
If you want coverage, generate a few hundred thousand SHA1 hashes along with your password.
Actually, running a trickle query of random SHA1 hashes from your box might be a fun exercise, along with a trickle query of random word tuples (bonus points for using Markov chains to generate statistically probable tuples).
If you search for 'sha1 foo', that's being sent across the network to DDG's servers. And sure, if you're using SSL then it's not going across in plain text, but it's decrypted and handled on their servers in plain text; it'll probably even end up in logs and/or tracking databases somewhere. You're giving DDG your password.
At worst you're giving the attacker a hash target to try brunting. He still has to brute it, and that takes time. Select your plaintext from a large enough keyspace and it's astronomical time.
I'll need to review their policy more closely, but DDG claim fairly minimal tracking. At best someone might be able to correlate hash lookup with some IP space. That's a long way from handing over passwords. And as I already indicated, you could cradled the queries to make the search space much larger.
No, no, no. You're 100% completely misunderstanding this.
When you search for 'sha1 foo', that query ("sha1 foo") goes up to the server. They know your password is "foo" and that you're attempting to "sha1" it. They don't have a hash, they take that data and perform the hash, then send that down to you.
I guess I'm just too damned used to using systems that, you know, have useful tools installed locally (or can get them there really damned fast). Including SHA1 and MD5 hash generators.
And I was all worked up to tell you how wrong you were still being.
All because I couldn't fathom the possibility let alone reason anyone would need a third-party site to compute their hashes for them.
[xargs] allows you to pipe the output of one command as an argument to another command. By default it will show up at the tail end of the second command's arg list, but if you want to interleave it you can use -I flag:
No, he is trying to demonstrate how to use 'xargs node -e'.
Are you even reading this discussion properly or are you just searching for some shell snippets and ridicule them as soon as you get a chance? This is what it looks like from your history: http://news.ycombinator.com/threads?id=uselessuseof
The perl one liner was funny, the shell one liner was light hearted, but your node solution is just pure fanboyism and quite frankly not in line with the spirit of the two previous posts.
And.. the node.js solution doesn't do what either the Perl or shell one liners do. It doesn't tell you whether the password was found in the file. All it does is print out a SHA1 hash of a string.
I'm surprised at the backlash to what I thought was fun code golfing. No one called me names after I posted a simple Python solution that didn't check the file. For what it's worth I've changed my LI password and I haven't bothered downloading the actual hash file.
node has a neat API for quickly knocking out stuff like this; it's a useful tool for more than just server code. Calling that comment fanboyism is just displaying the opposite of fanboyism, prejudice against hyped-up tools that nevertheless are good tools.
My point still stands. There's funny and then theres blatent fanboyism. You're like a prepubescent teenager who doesn't understand the context of social situations so always says something stupid.
"Which brings us to the most important principle on HN: civility. Since long before the web, the anonymity of online conversation has lured people into being much ruder than they'd dare to be in person. So the principle here is not to say anything you wouldn't say face to face. This doesn't mean you can't disagree. But disagree without calling the other person names. If you're right, your argument will be more convincing without them."
Some people actually do call names to others when face to face.
Personally, while I don't, I do tend to get a little aggressive and then I'm often surprised with the backlash, because I get that way when I'm genuinely enjoying the conversation, not when I'm irritated.
Tone doesn't carry on the Internet, so no one knows you're enjoying it. Hence, it generally degrades the quality of the conversation, which is the opposite of what we want at HN.
No, I'm saying I do that face-to-face, and people still can't tell I'm enjoying it. So the tip to say nothing that you wouldn't say IRL is useless to me; I just can't help it.
The first one ramps up memory use like crazy (which I was trying to avoid) and the second one is much better with memory, but you need to move the sha1_hex into the BEGIN block or you're recomputing the hash for every line parsed, thrashing your CPU. Interesting use of 'shift' though, I didn't know you could modify the file argument to -n like that.
Yeah I'm aware of http://partmaps.org/era/unix/award.html#cat and choose to continue writing my scripts this way. My commands look more symmetric at the prompt, and are easier to manipulate.
dups is indeed a little helper of mine. Like uniq it only handles sorted input. Update: I see you edited your answer to include uniq -d. I wasn't aware of the option, thanks. Now I can simplify the implementation of dups. But I find the name valuable, and I think it's perverse to say uniq when you mean its opposite.
Each pipe stage reads from the left and writes to the right. The eye goes left to see the input and right to see the output if it's redirected to file.
The input file is reliably the second word, so C-A M-f gets me to it if I want to operate on a different file. !!:1 gets me the file if I want to use it in a new command.
I'm not sure what you're suggesting. I'm supposed to echo |cut ...? But I have a whole file, not just one line. So I have to cat ... |cut ... -- which is what I did. So what's your point?
I could keep the file first by saying:
$ < combo_not.txt cut -c7-40 |sort |dups |wc -l
To which I reply, "Yuck!"
Perhaps we should stop here. You seem to have made this account just a few hours ago for the express purpose of poking at people's code fragments in this thread. You're making stylistic nitpicks (they don't affect correctness, do they?) and you're making them in a tone that I'm not sure I would take from Randal Schwartz himself (you actually edited http://news.ycombinator.com/item?id=4076556 to be ruder than the original). It's a drag, man.
I disagree with #5, I had a few of my coworkers check their sha1 against the DB and most of them were not in the dump. I also checked for truncated hashed, none of which were found. I have the feeling this is a subset of the full database
So I have a funny wild theory...remember back when the Gawker database was compromised? And LinkedIn forced a password reset for users who (according to what I read) used email addresses that matched the Gawker leak?
What if they also (or actually) compared password hashes from their database to the ones released in the Gawker breach? In that case, they likely wouldn't have pulled data straight from the database but actually might have pulled passes from the db, output to text files, cut the text files up to parcel out for processing via Hadoop or something? And somehow one of those text files got loose somehow...or someone MiTMed the actual process (I'd vote for a floating text file just because it's been so long; the Gawker breach was in December 2010).
my fairly complex alphanumeric+symbol password IS in the dump, though not prepended truncated with 0's and the other one I found, which my coworker admitted was too short and alpha only, was in the dump with prepended 0's.
This could validate the fact that the truncated hashes are actually already cracked.
Same here - mine was all alpha characters, seven characters, and the hash with five 0's was in the file. Guess who just changed their LinkedIn password today? And included some numbers?
My password is in the dump. I use the Forget Passwords Chrome extension [1], which is based on pwdhash.com, and generate site-specific passwords based on a master password -- i.e. my password is only used on LinkedIn and it's unlikely that I share it with someone else.
I think I have changed to this password during the last year.
My password hash which was last rotated July 5, 2011
.,7^R8Cl}g1}Ze6f
Was _not_ found in the file (with/without 00000). I have, of course, changed it today. Strangely enough, the previous password is also not in the list.
Don't know if this adds anything, but both my old password (created eight years ago) and current password (changed six months ago) were on the list. Both were very unique - 20 characters mixed.
Need to get better at changing my PWs every three months. It's really not that hard, just a matter of discipline.
Hmm. My truncated password (for my now-deleted account) is not in the list of hashes -- so it's not just a uniq'd full DB. Also, the original forum thread where the file was first posted only managed to break around 600,491 passwords before it went offline ... so 3,521,180 broken passwords could mean that the original hacker has had access to some LinkedIn accounts for more than just a few minutes today.
Same here. My password is not in the list and I've had a LinkedIn account since 2003. I probably changed my password about 18 months ago. Neither that nor the previous one are on the list.
My password is not in the list, not idiotic but not super-hard . I doubt this is the full list. I hadn't changed mine in years, so maybe this is from a certain period of time?
This yielded success on some known passwords and a bunch of obvious passwords. Not mine, but I assume this dump is a list of the passwords they've cracked so far (i.e., even if your password isn't on this list - change it).
It does if you're trying to estimate the size of the corpus based on the number of users.
The arithmetic mean is specifically the value you'd want. n users times m users/password == total passwords (unduplicated) in the LinkedIn database.
Zipf distribution would suggest that the pattern of reuse among passwords isn't normal, and that the median and mode are probably higher than the arithmetic mean.
My password also doesn't appear to be in the list, so I doubt it is the complete/current file. I used this python to check, in case anyone else wants to use it:
from hashlib import sha1
f = "combo_not.txt"
hashes = [x[0:40] for x in open(f)] # [0:40] to stripe off \n
# From another comment
def check_pass(plaintext, offset=5):
hashed = sha1(plaintext).hexdigest()
return (hashed, '0' * offset + hashed[offset:])
print check_pass("linkedin")[0] in hashes # -> False
print check_pass("linkedin")[1] in hashes # -> True (sanity check)
myHash, myHashBroken = check_pass("plaintextoflinkedinpassword")
print myHash in hashes # -> False
print myHashBroken in hashes # -> False
Mine was not in the list. It's also possible this isn't the entire file. I was also able to recover 225129 other passwords with a wordfile and some Python based on truncated and full hashes.
A stock JtR 1.7.9-jumbo5, using the default rules, is finding quite a few of the non-zeroed ones pretty quickly. This surprises me; I would have expected them to have run the list through the JtR mill before passing it on to others.
Likewise, my password (MybXy836YCza), which wasn't used anywhere except my LinkedIn account created 29-Jan-2012, and has been stored securely at my end, wasn't on the list (either as a full SHA1 sum, or as part of the SHA1).
As you probably guessed from the fact that I posted my old password, I changed it just in case the list that was shared is only a partial list of what was obtained.
Do you remember when you first used this password at LinkedIn? It could help narrow the dates of the breach. Especially useful would be the presence of a strong password in the list that was subsequently changed. That might help determine its freshness, if the new password isn't present (although this may be an incomplete list from an ongoing breach).
I'm thinking this list is from closer to a year ago, I changed my password shortly after the MtGox hack last year and this hash is for my old password that was compromised during that time period.
It was about a year ago now. I checked the hashes for my previous password and it wasn't on the list... Mind you, as many have noticed, it seems to be very incomplete.
Unbelievable/insulting they used a general purpose, easily reversible hash like SHA1 in the first place. I would have thought everyone had seen the 'use bcrypt' page by now.
I couldn't find my password on the list and I've been using the same password for LinkedIn since I registered. I was trying to remember when was that. If someone know how to find out the last time you changed your pass or when you registered for linkedIn please let me know. I'd guess I use linkedIn for over 4 years at least.
A "member since" date is available on the "Account & Settings" page. Choose "settings" in the drop down that appears when you hover over your (account) name in the upper right corner of any LinkedIn page.
I agree, I've tried several passwords and they match. If you're a Math person, please shed some light on the chances that this list covers the full space.
I'm not a math person either, but here's some fodder for someone who is.
Mark Burnett's extensive password collection (which he acknowledges is skewed, because it's largely based on cracked passwords, he only harvests passwords between 3 and 30 chars, etc.). Here's how some of his stats shake out:
* Although my list contains about 6 million username/password combos, the list only contains about 1,300,000 unique passwords.
* Of those, approximately 300,000 of those passwords are used by more than one person; about 1,000,000 only appear once (and a good portion of those are obviously generated by a computer).
* The list of the top 20 passwords rarely changes and 1 out of every 50 people uses one of these passwords.
So it's conceivable that 6M unique passwords could cover a very significant portion of a 120M user namespace.
It's neat that the hashes are unique enough to serve as their own key. Obvious in retrospect, but still neat.
Curious why some of the hashes have been obscured with 00000 but not all. It means more than one possible password could generate the remaining characters, but what does that help or protect?
6.5 million? Off the top of my head, assuming that passwords are only letters and 5 characters long this still wouldn't cover the possible space.
[I think it's safe to ignore hash collisions]
Are you trying passwords you've used on other sites, or random ones? If it's the former, then LI might not be the only source for the file.
0. This is a file of SHA1 hashes of short strings (i.e. passwords).
1. There are 3,521,180 hashes that begin with 00000. I believe that these represent hashes that the hackers have already broken and they have marked them with 00000 to indicate that fact.
Evidence for this is that the SHA1 hash of 'password' does not appear in the list, but the same hash with the first five characters set to 0 is.
Same story for 'secret': And for 'linkedin': 2. There are 2,936,840 hashes that do not start with 00000 that can be attacked with JtR.3. The implication of #1 is that if checking for your password and you have a simple password then you need to check for the truncated hash.
4. This may well actually be from LinkedIn. Using the partial hashes (above) I find the hashes for passwords linkedin, LinkedIn, L1nked1n, l1nked1n, L1nk3d1n, l1nk3d1n, linkedinsecret, linkedinpassword, ...
5. The file does not contain duplicates. LinkedIn claims a user base of 161m. This file contains 6.4m unique password hashes. That's 25 users per hash. Given the large amount of password reuse and poor password choices it is not improbable that this is the complete password file. Evidence against that thesis is that password of one person that I've asked is not in the list.