0. This is a file of SHA1 hashes of short strings (i.e. passwords).
1. There are 3,521,180 hashes that begin with 00000. I believe that these represent hashes that the hackers have already broken and they have marked them with 00000 to indicate that fact.
Evidence for this is that the SHA1 hash of 'password' does not appear in the list, but the same hash with the first five characters set to 0 is.
5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8 is not present
000001e4c9b93f3f0682250b6cf8331b7ee68fd8 is present
Same story for 'secret':
e5e9fa1ba31ecd1ae84f75caaa474f3a663f05f4 is not present
00000a1ba31ecd1ae84f75caaa474f3a663f05f4 is present
And for 'linkedin':
7728240c80b6bfd450849405e8500d6d207783b6 is not present
0000040c80b6bfd450849405e8500d6d207783b6 is present
2. There are 2,936,840 hashes that do not start with 00000 that can be attacked with JtR.
3. The implication of #1 is that if checking for your password and you have a simple password then you need to check for the truncated hash.
4. This may well actually be from LinkedIn. Using the partial hashes (above) I find the hashes for passwords linkedin, LinkedIn, L1nked1n, l1nked1n, L1nk3d1n, l1nk3d1n, linkedinsecret, linkedinpassword, ...
5. The file does not contain duplicates. LinkedIn claims a user base of 161m. This file contains 6.4m unique password hashes. That's 25 users per hash. Given the large amount of password reuse and poor password choices it is not improbable that this is the complete password file. Evidence against that thesis is that password of one person that I've asked is not in the list.
Prefix the whole command with a space to avoid dumping your password into your bash history:
" grep `echo -n yourpassword | shasum | cut -c6-40` SHA1.txt"
Ctrl+r history search? I'd tend to maintaining a complete history log so that when I've forgotten the one liner I used to rotate my videos 2 years ago I can easily recall it.
I thought 16k entries might be reasonable but that doesn't even last 3 weeks for me. I think there might have been some issue with slow disk seeks so at some point I restricted it to that many.
I guess it probably it would be better to regularly backup the history file to deal with possible some accidental truncations and issues when running multiple shells concurrently, but probably the overall effort to set up such a system would outweight the benefits.
If you want coverage, generate a few hundred thousand SHA1 hashes along with your password.
Actually, running a trickle query of random SHA1 hashes from your box might be a fun exercise, along with a trickle query of random word tuples (bonus points for using Markov chains to generate statistically probable tuples).
If you search for 'sha1 foo', that's being sent across the network to DDG's servers. And sure, if you're using SSL then it's not going across in plain text, but it's decrypted and handled on their servers in plain text; it'll probably even end up in logs and/or tracking databases somewhere. You're giving DDG your password.
At worst you're giving the attacker a hash target to try brunting. He still has to brute it, and that takes time. Select your plaintext from a large enough keyspace and it's astronomical time.
I'll need to review their policy more closely, but DDG claim fairly minimal tracking. At best someone might be able to correlate hash lookup with some IP space. That's a long way from handing over passwords. And as I already indicated, you could cradled the queries to make the search space much larger.
No, no, no. You're 100% completely misunderstanding this.
When you search for 'sha1 foo', that query ("sha1 foo") goes up to the server. They know your password is "foo" and that you're attempting to "sha1" it. They don't have a hash, they take that data and perform the hash, then send that down to you.
I guess I'm just too damned used to using systems that, you know, have useful tools installed locally (or can get them there really damned fast). Including SHA1 and MD5 hash generators.
And I was all worked up to tell you how wrong you were still being.
All because I couldn't fathom the possibility let alone reason anyone would need a third-party site to compute their hashes for them.
[xargs] allows you to pipe the output of one command as an argument to another command. By default it will show up at the tail end of the second command's arg list, but if you want to interleave it you can use -I flag:
No, he is trying to demonstrate how to use 'xargs node -e'.
Are you even reading this discussion properly or are you just searching for some shell snippets and ridicule them as soon as you get a chance? This is what it looks like from your history: http://news.ycombinator.com/threads?id=uselessuseof
The perl one liner was funny, the shell one liner was light hearted, but your node solution is just pure fanboyism and quite frankly not in line with the spirit of the two previous posts.
And.. the node.js solution doesn't do what either the Perl or shell one liners do. It doesn't tell you whether the password was found in the file. All it does is print out a SHA1 hash of a string.
I'm surprised at the backlash to what I thought was fun code golfing. No one called me names after I posted a simple Python solution that didn't check the file. For what it's worth I've changed my LI password and I haven't bothered downloading the actual hash file.
node has a neat API for quickly knocking out stuff like this; it's a useful tool for more than just server code. Calling that comment fanboyism is just displaying the opposite of fanboyism, prejudice against hyped-up tools that nevertheless are good tools.
My point still stands. There's funny and then theres blatent fanboyism. You're like a prepubescent teenager who doesn't understand the context of social situations so always says something stupid.
"Which brings us to the most important principle on HN: civility. Since long before the web, the anonymity of online conversation has lured people into being much ruder than they'd dare to be in person. So the principle here is not to say anything you wouldn't say face to face. This doesn't mean you can't disagree. But disagree without calling the other person names. If you're right, your argument will be more convincing without them."
Some people actually do call names to others when face to face.
Personally, while I don't, I do tend to get a little aggressive and then I'm often surprised with the backlash, because I get that way when I'm genuinely enjoying the conversation, not when I'm irritated.
Tone doesn't carry on the Internet, so no one knows you're enjoying it. Hence, it generally degrades the quality of the conversation, which is the opposite of what we want at HN.
No, I'm saying I do that face-to-face, and people still can't tell I'm enjoying it. So the tip to say nothing that you wouldn't say IRL is useless to me; I just can't help it.
The first one ramps up memory use like crazy (which I was trying to avoid) and the second one is much better with memory, but you need to move the sha1_hex into the BEGIN block or you're recomputing the hash for every line parsed, thrashing your CPU. Interesting use of 'shift' though, I didn't know you could modify the file argument to -n like that.
Yeah I'm aware of http://partmaps.org/era/unix/award.html#cat and choose to continue writing my scripts this way. My commands look more symmetric at the prompt, and are easier to manipulate.
dups is indeed a little helper of mine. Like uniq it only handles sorted input. Update: I see you edited your answer to include uniq -d. I wasn't aware of the option, thanks. Now I can simplify the implementation of dups. But I find the name valuable, and I think it's perverse to say uniq when you mean its opposite.
Each pipe stage reads from the left and writes to the right. The eye goes left to see the input and right to see the output if it's redirected to file.
The input file is reliably the second word, so C-A M-f gets me to it if I want to operate on a different file. !!:1 gets me the file if I want to use it in a new command.
I'm not sure what you're suggesting. I'm supposed to echo |cut ...? But I have a whole file, not just one line. So I have to cat ... |cut ... -- which is what I did. So what's your point?
I could keep the file first by saying:
$ < combo_not.txt cut -c7-40 |sort |dups |wc -l
To which I reply, "Yuck!"
Perhaps we should stop here. You seem to have made this account just a few hours ago for the express purpose of poking at people's code fragments in this thread. You're making stylistic nitpicks (they don't affect correctness, do they?) and you're making them in a tone that I'm not sure I would take from Randal Schwartz himself (you actually edited http://news.ycombinator.com/item?id=4076556 to be ruder than the original). It's a drag, man.
I disagree with #5, I had a few of my coworkers check their sha1 against the DB and most of them were not in the dump. I also checked for truncated hashed, none of which were found. I have the feeling this is a subset of the full database
So I have a funny wild theory...remember back when the Gawker database was compromised? And LinkedIn forced a password reset for users who (according to what I read) used email addresses that matched the Gawker leak?
What if they also (or actually) compared password hashes from their database to the ones released in the Gawker breach? In that case, they likely wouldn't have pulled data straight from the database but actually might have pulled passes from the db, output to text files, cut the text files up to parcel out for processing via Hadoop or something? And somehow one of those text files got loose somehow...or someone MiTMed the actual process (I'd vote for a floating text file just because it's been so long; the Gawker breach was in December 2010).
my fairly complex alphanumeric+symbol password IS in the dump, though not prepended truncated with 0's and the other one I found, which my coworker admitted was too short and alpha only, was in the dump with prepended 0's.
This could validate the fact that the truncated hashes are actually already cracked.
Same here - mine was all alpha characters, seven characters, and the hash with five 0's was in the file. Guess who just changed their LinkedIn password today? And included some numbers?
My password is in the dump. I use the Forget Passwords Chrome extension [1], which is based on pwdhash.com, and generate site-specific passwords based on a master password -- i.e. my password is only used on LinkedIn and it's unlikely that I share it with someone else.
I think I have changed to this password during the last year.
My password hash which was last rotated July 5, 2011
.,7^R8Cl}g1}Ze6f
Was _not_ found in the file (with/without 00000). I have, of course, changed it today. Strangely enough, the previous password is also not in the list.
Don't know if this adds anything, but both my old password (created eight years ago) and current password (changed six months ago) were on the list. Both were very unique - 20 characters mixed.
Need to get better at changing my PWs every three months. It's really not that hard, just a matter of discipline.
Hmm. My truncated password (for my now-deleted account) is not in the list of hashes -- so it's not just a uniq'd full DB. Also, the original forum thread where the file was first posted only managed to break around 600,491 passwords before it went offline ... so 3,521,180 broken passwords could mean that the original hacker has had access to some LinkedIn accounts for more than just a few minutes today.
Same here. My password is not in the list and I've had a LinkedIn account since 2003. I probably changed my password about 18 months ago. Neither that nor the previous one are on the list.
My password is not in the list, not idiotic but not super-hard . I doubt this is the full list. I hadn't changed mine in years, so maybe this is from a certain period of time?
This yielded success on some known passwords and a bunch of obvious passwords. Not mine, but I assume this dump is a list of the passwords they've cracked so far (i.e., even if your password isn't on this list - change it).
It does if you're trying to estimate the size of the corpus based on the number of users.
The arithmetic mean is specifically the value you'd want. n users times m users/password == total passwords (unduplicated) in the LinkedIn database.
Zipf distribution would suggest that the pattern of reuse among passwords isn't normal, and that the median and mode are probably higher than the arithmetic mean.
My password also doesn't appear to be in the list, so I doubt it is the complete/current file. I used this python to check, in case anyone else wants to use it:
from hashlib import sha1
f = "combo_not.txt"
hashes = [x[0:40] for x in open(f)] # [0:40] to stripe off \n
# From another comment
def check_pass(plaintext, offset=5):
hashed = sha1(plaintext).hexdigest()
return (hashed, '0' * offset + hashed[offset:])
print check_pass("linkedin")[0] in hashes # -> False
print check_pass("linkedin")[1] in hashes # -> True (sanity check)
myHash, myHashBroken = check_pass("plaintextoflinkedinpassword")
print myHash in hashes # -> False
print myHashBroken in hashes # -> False
Mine was not in the list. It's also possible this isn't the entire file. I was also able to recover 225129 other passwords with a wordfile and some Python based on truncated and full hashes.
A stock JtR 1.7.9-jumbo5, using the default rules, is finding quite a few of the non-zeroed ones pretty quickly. This surprises me; I would have expected them to have run the list through the JtR mill before passing it on to others.
Likewise, my password (MybXy836YCza), which wasn't used anywhere except my LinkedIn account created 29-Jan-2012, and has been stored securely at my end, wasn't on the list (either as a full SHA1 sum, or as part of the SHA1).
As you probably guessed from the fact that I posted my old password, I changed it just in case the list that was shared is only a partial list of what was obtained.
Do you remember when you first used this password at LinkedIn? It could help narrow the dates of the breach. Especially useful would be the presence of a strong password in the list that was subsequently changed. That might help determine its freshness, if the new password isn't present (although this may be an incomplete list from an ongoing breach).
I'm thinking this list is from closer to a year ago, I changed my password shortly after the MtGox hack last year and this hash is for my old password that was compromised during that time period.
It was about a year ago now. I checked the hashes for my previous password and it wasn't on the list... Mind you, as many have noticed, it seems to be very incomplete.
Unbelievable/insulting they used a general purpose, easily reversible hash like SHA1 in the first place. I would have thought everyone had seen the 'use bcrypt' page by now.
I couldn't find my password on the list and I've been using the same password for LinkedIn since I registered. I was trying to remember when was that. If someone know how to find out the last time you changed your pass or when you registered for linkedIn please let me know. I'd guess I use linkedIn for over 4 years at least.
A "member since" date is available on the "Account & Settings" page. Choose "settings" in the drop down that appears when you hover over your (account) name in the upper right corner of any LinkedIn page.
I agree, I've tried several passwords and they match. If you're a Math person, please shed some light on the chances that this list covers the full space.
I'm not a math person either, but here's some fodder for someone who is.
Mark Burnett's extensive password collection (which he acknowledges is skewed, because it's largely based on cracked passwords, he only harvests passwords between 3 and 30 chars, etc.). Here's how some of his stats shake out:
* Although my list contains about 6 million username/password combos, the list only contains about 1,300,000 unique passwords.
* Of those, approximately 300,000 of those passwords are used by more than one person; about 1,000,000 only appear once (and a good portion of those are obviously generated by a computer).
* The list of the top 20 passwords rarely changes and 1 out of every 50 people uses one of these passwords.
So it's conceivable that 6M unique passwords could cover a very significant portion of a 120M user namespace.
It's neat that the hashes are unique enough to serve as their own key. Obvious in retrospect, but still neat.
Curious why some of the hashes have been obscured with 00000 but not all. It means more than one possible password could generate the remaining characters, but what does that help or protect?
6.5 million? Off the top of my head, assuming that passwords are only letters and 5 characters long this still wouldn't cover the possible space.
[I think it's safe to ignore hash collisions]
Are you trying passwords you've used on other sites, or random ones? If it's the former, then LI might not be the only source for the file.
"We were curious what would happen to our share price if our company did something incredibly stupid"
The above comment might seem incredibly harsh, but really, there's no good excuse for a site this prominent to not have a salted, secure password hashing system. Even if they started with an unsalted password system, users can be migrated to the newer more secure system on next login.
The only way I could regain respect for LinkedIn is if we find that these unsalted hashes were from users who never logged in to LinkedIn after the security upgrade. From the replies of other HN users who have found their password hashes in the leaked list, this doesn't seem to be the case though.
I can understand database leaks. Bad things happen. Not being prepared for such an event however is where I draw the line. These leaks impact users far beyond just the site at fault.
It's not enough to say users should use LastPass. They don't, and that's the world we live in, for better or worse.
If computer security doesn't take into account problematic users, then it's flawed computer security.
Surely just hashing the username|password would massively reduce the effectiveness of leaks like this? Sure, a hacker would know what the "salt" is, but since it now varies between users you would expend the same amount of effort breaking one person's login as you previously would spend breaking everyones (on average).
(Not recommending it, just wondering if my reasoning is correct.)
I hear this commonly, so it is a good idea to clear it up.
Usernames have lower entropy than a random salt and are predictable in many cases. People re-use usernames and some usernames are common. If your password system became common on the web, or if I knew the workings of your password system (i.e. open source / leaked codebase / Kerckhoffs's principle[1]), I could generate a rainbow table for either common or targeted users. This means I could generate a rainbow table for "Jabbles", gain access to your password and compromise your account before the website is likely even aware of a breach or has time to warn you. Salts only act to slow down, not prevent, compromising leaked password hashes (as you can always brute force which is quite practical with MD5/SHA1). Thus, using a username defeats one of the stated purposes of salting.
It's also said ad nauseam (with good reason) but rolling your own in security is a bad idea, especially when libraries exist that do exactly what you'd intend to do just as easily. Algorithms such as bcrypt and scrypt exist and are well vetted. bcrypt is easy to integrate with many languages and provides a trivial interface and sane defaults for iterations/rounds [brute force] and salts [rainbow table]. bcrypt can also handle increasing the security of your system over time as the metadata is stored as part of the hash.
tl;dr Using a username for salting means a targeted attack against a single or small number of users would be damn near impossible to stop as the second they have the password hashes they also have the passwords.
Often people say "Don't roll your own security" but the reality is that developers aren't trying to roll their own. They are trying to solve a problem, and if a quick google doesn't turn up a good library then they'll try and figure it out. Googling for password security implementations is likely to be fraught with horrible horrible advice.
I guess what I'm saying is that it's not enough to say don't do it, instead the defaults need to be there (and very visible).
I think we've reached a point with bcrypt that a good secure password system is within reach and comes with sane defaults and ease of use as features for most programming languages.
If it's just an issue of getting the word out there, then I'm hopeful things can improve.
You need more than just bcrypt. You've hinted at other things, but a few random things popping in to my mind:
* Preventing password logging (many web frameworks log parameters)
* Secure password recovery
* New alternative attack vectors (eg. Facebook, Twitter auth)
* XSS and CSRF
There are so, so many simple to make security errors, and worse - many of them are inter-related so that forgetting one will make another vulnerable. This is why you need safe defaults and more Security education.
A strong password hash doesn't gate on any of those things, so, while you do indeed need to pay attention to them, you don't need to pay attention to them before you deploy a strong password hash.
You should deploy a strong password hash immediately.
True point and this is probably off topic, but out of curiosity, what is the recommended approach for his point about logging messages/requests?
On previous projects, we've gone through all sorts of machinations to detect a password in our SOAP logging. This usually involves XML parsing (slow, ineffective on malformed messages) and Regexes (ineffective on malformed or "unusual" messages).
I can't think of anything better, short of "you can't leak what you don't log" which is nice in theory but not always practical.
There are defaults bcrypt and PBKDF2. There is no excuse for anyone to do anything less than salted hashes even if the decide not to follow bcrypt or PBKDF2.
Having a password salted with the username fairly easily balloons out the complexity of building and searching a rainbow table by a factor of the number of usernames you want to be useful for. This factor is larger then you'd expect, given the sheer quantity and variety of usernames in various systems.
For a targeted attack it really doesn't matter as the time complexity to produce the rainbow table is equivalent to that of simply brute forcing the hash, ie, you can't say 'well assume the rainbow table contains only some small number of usernames"...
It also is entirely unlike the WPA2 rainbow tables in that you don't have millions of users all sharing the same username (ie. factory default SSIDs).
Overall it's more secure then it seems at first glance but you still have to ask yourself why you'd use that over a random salt.
The targeted attack does matter though, for the reason I pointed out above.
I can produce a rainbow table offline before I compromise the targeted system as I know the username of my target. This is not possible if the salt is random. This means I can crack a targeted user's password hash _instantly_ upon gaining access to the system.
With a random salt, you can only perform the brute force attack on that targeted user _after_ you've gained access to the system and likely alerted them to a compromise.
If the response time of the compromised system and team is a factor, this means using a username as a salt compromises your security greatly.
tl;dr Using a username for salting means a targeted attack against a single or small number of users would be damn near impossible to stop as the second they have the password hashes they also have the passwords.
1) You know the hash function beforehand
2) You know that they are salting in exactly this way
3) You know how they are doing their salting (HMAC vs., vs.)
4) You have enough time to create this new rainbow table
5) You have only just enough access to the system to dump the hashes (ie. the easier routes are blocked off from you)
That would in fact, with some probability (based upon the complexity of your rainbow table and the complexity of the users password), give you the passwords for a particular set of users.
I did say that it was more secure then it seems, not that it was perfectly secure :)
While not entirely random, would a "date based" salt work as well? Say, the date that the entry was added? This would still negate rainbow tables as a specific user entry needs to be targeted.
It would probably work well enough, but... why not just add a proper random salt field that isn't tied to anything an attacker could guess? Is something like 8 bytes per user too expensive?
Remember salts don't need to be secret to do their job. The goal is to change the algorithm slightly (by adding additional input) for each user. That means you can't mass-precompute (rainbow tables), and just look up what matches, you have to break each user individually.
Your reasoning about how salts work is correct.
There's also something called a pepper which is another additional bit of input data, that is only stored in the app code (fixed for entire app). So an attacker who only manages to get a database dump would need to guess yet another chunk of data (making it near impossible). So a well-seasoned hash would be SLOW_HASH(pepper+salt+password).
Security is all about layers. Each layer protects a bit more, or prevents things from being easy for the attacker.
Edit: Don't do this yourself. Know it for the theory part - but then just use a well-vetted library to do it.
Please refer to my comment above. You can precompute a rainbow table if you know the username (trivial) and the method of hashing[1]. Whilst usernames as salts would increase security over no salt, it results in a potential exploit / vulnerability that would not exist if the salt was truly random. Hence, suggesting the use of usernames as salts is not wise.
I read cschneid's comment twice, and nowhere to I see where he or she specifically recommends using the username as a password; he or she simply recapitulates the logic behind using a unique salt value for each stored hash, and describes using an additional non-unique value which is not stored with the passwords ("pepper"), which is a new and interesting idea, at least to me.
Re: pepper - The devise plugin for Rails uses it. The idea is that the attacker must now steal both the app code AND database, which are often on separate servers.
It would make it a lot easier for LinkedIn to identify whose hashes were leaked because with a salt, all passwords would be unique. It would also make rainbow tables useless.
But in this day and age, the bigger problem is how fast you can compute the hashes, salt or no. With GPUs you can calculate a few hundred million(depending on the hashing algorithm) per second, making the algorithm used the real vulnerability.
Best practice involves increasing the calculation time of you're algorithm. Theoretically, you could just rehash y few thousand times in a loop, throwing in a salt here and there, but practically, you should just use bcrypt or scrypt.
In a password hashing scheme with a salt, you're supposed to consider everything except the cleartext to be public, for the purposes of analysis. The password should be unrecoverable even if the attacker knows the algorithm and any salts.
It's true that that would be an improvement, however we try to avoid discussing things like that seriously because of the risk that someone new to the game will actually try to do it. The easy answer is to use an out-of-the-box secure password strategy, anything else is adolescent.
Regarding requiring users to log in; wouldn't it be better to run their current hash through another password hashing scheme (while we're at it bcrypt, scrypt, PBKDF, etc)? Then, the next time they log in, verify them by running their password through the old algorithm, and the result through the new one.
That could be a good transition strategy if you're worried about being compromised before all your users have logged in again, but you would still want to move them over to using just the new system when they do. It probably would be fine, but when it comes to crypto you don't take chances when you don't have to.
>> Even if they started with an unsalted password system, users can be migrated to the newer more secure system on next login.
In thinking about this, I wonder if in that scenario you'd even have to wait until next login. You could just use the weak hash as the input to your salted hash function and keep a flag of whether or not you need to 'pre-hash' the password before using your v2.0 salted hash. As users log in you could replace slowly replace the double hashed entries with single salted hash versions and flip the flag.
What do you recommend users do instead? Unfortunately there will probably always be websites storing passwords in unsecure ways. I mean I'd certainly rather not have to deal with the hassle (however small) of using LastPass, but as you said, that's the world we live in. Hoping for competence by the writers/maintainers of websites is also flawed computer security, is it not?
Hoping for competence is indeed flawed from both sides.
I would hope users use distinct, random passwords for each site they visit and that developers store those passwords in a safe secure way. I also assume both sides won't listen to logic however :)
The reason I'm annoyed with this particularly is that larger sites are more likely targets due simply to their size. Larger sites generally have the developer resources to provide a good solution to the problem from their end but commonly don't.
This makes them look bad and means their users are left in more danger than before. No-one wins.
And a follow-up: "Our team continues to investigate, but at this time, we're still unable to confirm that any security breach has occurred. Stay tuned here."
What's kept me away from such solutions are these questions: How can you trust one service with all your passwords? What if their configuration has a vulnerability?
KeePass works well too - open source, offline solution that has an "Autotype" function.
I actually only run into passwords that are a pain on mobile devices. Now that my Android phone has no keyboard but tons of power, that's becoming more and more significant.
I use keepass too. I keep my database in dropbox and use the android dropbox and keepass clients on my android. Logging into an app or website involves opening dropbox, clicking on the database[1], entering my password, choosing the site, and clicking on "copy password to clipboard." It's a few extra steps, but it's not that much of a hassle.
[1] I find this easier than opening keepass and selecting the database from dropbox for some reason that might be as simple as dropbox having an easier to spot icon.
You can also use the favorite feature on Dropbox to keep a fresh copy of the database on your phone and have KeePassDroid remember that location. Then your flow is 1) open KeePassDroid 2) enter password 3) select site 4) copy/paste
The enter (long alphanumeric and symbols) password/copy/paste/switch window was a little clunky in Android 2.2. Little better in ICS, so need to get back to using this.
One more KeePass user here (actually KeePassX). But I'm using it only for not my own passwords, provided by others and so on.
For my personal ones I'm keeping few algorithms in my brains. I'm using resource type (website/some server/device) and name (e.g. domain/model) as variables and after few steps in my head I always have different password for each kind of service.
Use open-source tools such as SHA1-Pass. The passwords it generates can be recreated with openssl and any other standard crypto library.
Edit: I wrote SHA1-Pass, so I'm biased, but I know what you mean about having trust issues with closed-source password tools. That's one of the reasons I wrote it.
I use open source tools such as "pwgen", "emacs" and "gpg". Open up the encrypted file in the editor, type your pass phrase if you haven't this session, cut and paste, close file. The built-in keyboard navigability makes this faster than everything but the in-browser form filling.
You might consider renaming it. I've been looking for several minutes and can't find it via that name.
Is it this: http://manpages.ubuntu.com/manpages/natty/man1/sha1pass.1.ht... I don't see how you would use this the same way you'd use the other tools mentioned here. I can imagine a way, but it's no where near as convenient and still has it's own major usability problems.
Nothing, really. However, I trust the LastPass guys to keep their shit secure as much as I trust myself to keep my own system secure.
After all, if my own system is compromised, I just get a lot of hassle. If LastPass ever gets hacked and leaks their passwords, they lose their business overnight. That's pretty good motivation for them to keep on top of their stuff.
I used to use 1Passwd, which stored the passwords in a local file, and that could be said to be marginally more secure, except that it generally uses something like iCloud or Dropbox to sync the passwords, so there's still a single point of failure... The main reason I moved away from 1Password was that they gave me a shitty response when I asked them if they were going to support Chrome. I decided at that point that I didn't want to give them my money anymore, and so I didn't upgrade to 1Password 3.
The big difference between "hosted service" and "encrypted file in the cloud" is that the hosted service has, by definition, to store the key next to the lock to be practical.
The key for your encrypted file stays in your head (and/or in your wallet), so even a full-on total breach of Dropbox/iCloud, your key is safe, and 8 million rounds of 265-bit AES and a good password (my current KeePass settings) is still unbreakable[1].
1: Unless (perhaps) you have the attention of certain governments. And they always have the option of using a $5 wrench on you, anyway.
As far as I know, LastPass does not "store the key next to the lock."[1] The browser extension encrypts/decrypts locally. If you use your password file through the web site you're still downloading your encrypted DB from them and encrypting/decrypting locally (whether with the extension, or I believe they also have a pure JS implementation).
[1] Or so they say. I've never MITMed their SSL, and their software is not open source AFAIK. This is not to say someone couldn't e.g. distribute a trojaned version of their browser extensions. If you poke around the developer(s) have at least revealed the encryption method for the your DB so you can verify how it is encrypted for yourself, which is a good sign if nothing else.
Why can't the hosted service use an "encrypted file in the cloud" as its implementation? As long as it requires client-side code to do the decryption, the key stays in your head alone.
> except that it generally uses something like iCloud or Dropbox to sync the passwords, so there's still a single point of failure
No. This is the strength of two-factor authentication, something you know, and something you have. If someone gets your 1Password keyfile, it's useless without your decrypting password.
I use 1Password, rather than lastpass. On that system, your password file is stored locally by default, so their isn't a centralized password store to attack. If you do syncing of passwords between machines, you keep an encrypted password file in your dropbox account.
I think it's a risk with a solution like this, but much less of a risk of having to remember all these passwords myself (a practice which tends to devolve to re-using passwords).
This is why I use 1password and not LastPass - the encrypted password file is stored locally - optionally in Dropbox, which is what enables moble and remote (http online through Dropbox) to work.
LastPass encrypts your passwords using your master password as (at least part of) the key. This means that they do decryption of passwords client-side as well. The entire password file is not stored locally but they had an intrusion of some sort a number of months back which demonstrated that they have a pretty good system set up along with quite a bit of monitoring. Truecrypt in dropbox is obviously a good choice if you're super paranoid but after seeing LastPass respond to security really well and it having an overall pretty simple UX, I don't have any reason to not recommend it.
I use KeePass right now synced with Dropbox - what keeps me up at night is the fact that if the bad guys got my password file today, there could turn out to be a vulnerability in it discovered years from now that could allow them to get my password.
You're free to hit "delete" on linkedin, but there's a very high likelihood that it will only mean "hide my profile". Anyone who got your user/pass would probably be able to reinstantiate your account and do anything to it they wanted.
I took the step of markedly decreasing the information on my current legit profile. It includes my name and general title, but no job history. Public disclosures of connections, etc., are highly limited.
Having a fictional LinkedIn account can be amusing.
I'm worried in a few years LastPass could become a target, and now instead of someone having a password that 'could' be shared among your multiple accounts, you have now given the complete keys to the city by listing all of your logons great and small in a central repository.
This central repository then becomes a very appealing target.
I say this as a LastPass user, as I think it is the best of the current offerings, but I'm uncertain how to shield this huge central list. I wish it had multiple logon PW so that you could at least segment the risk and reduce the time the high PW is used to when you really need it.
It saddens me that every, single, time this topic comes up, HackerNews, of all places, displays an immense lack of knowledge of current password storage applications, how they work and what value they bring.
I think it's really humorous that people feel safe putting an encrypted file in something like Dropbox, but don't trust LastPass (who are doing the exact same thing, everything is local, client side encryption). Especially when you're missing out on all of the benefits of browser integration.
Please, take a whole 3 minutes and do a tiny bit of research. Your future self will thank you when people like swombat and myself get to laugh at LinkedIn, change our passwords and never think about it again.
I think the difference you're missing is that LastPass offers the OnlineVault option.
I much prefer the security of being in control of my file, and having its online option controlled by someone else (Dropbox); and logging into Dropbox to then see my passwords 'online' on the go.
If Lastpass.com is compromised, the attacker can MitM compromise my credentials.
If 1Password.com is compromised, that is not the case. (Yes, if Dropbox is compromised, they could capture my dropbox credentials, but it would be more difficult for them to then capture my 1password credentials)
>I much prefer the security of being in control of my file, and having its online option controlled by someone else (Dropbox); and logging into Dropbox to then see my passwords 'online' on the go.
You can't even do that. You have to install a local client. Download the file, open it in your new client, edit it, manually reupload it. If you don't want to use the on-web LastPass vault, then don't, but it's still doing local decryption and you can still used the signed Chrome extensions to carry out ops if you don't trust LastPass.com proper.
>If Lastpass.com is compromised, the attacker can MitM compromise my credentials.
Which part of "local, client-side encryption" is confusing?
edit: 1PassAnywhere is the exact same thing as what LastPass is doing with it's LastPass.com-served Vault.
edit2: There's even multifactor auth available for it and the Online Vault feature.
I apologise for my immense lack of knowledge of current password storage applications (i'm not a programmer and come here for the other stuff), but what is the benefit of these services (lastpass etc)? This is a genuine question.
It seems to me that instead of having several passwords in my head (i can remember random long strings of characters pretty well, and have a heirachy of randomness/longness depending on what I care about), I only have to remember one. But if that one's compromised, aren't all the rest then available?
Reminds me of the bit in hitchhikers guide to the galaxy (life the universe and everything i think) where passwords and biometrics etc had become really difficult and secure, so a datacube thing was created to store them all. Which was then found by a character before hilarity ensued.
1. Your physical machine, or the LastPass/Dropbox server.
2. Your master password
3. (optionally) a second-factor auth source
Then yes, they have access to all your passwords. But this is vastly superior to having one password that alone compromised grants access to all of your accounts, right?
I mean, the most secure way imaginable would be perfect biometric signatures, or humans smart enough that they could perform asymmetric encryption in their heads to sign challenges in a verifiable manner. Outside of that, this is decentish.
You could use a text file in a Truecrypt volume with keys that are stored on separate jumpdrives (but what if someone compromises a machine that you plug those drives into), etc, etc.
To expand on that, to store passwords don't just use salt+sha1, or try to do your own nested sha1, just use bcrypt: http://en.wikipedia.org/wiki/Bcrypt
I just had to make this choice a few days ago and bcrypt seemed like the best option with working PHP implementations. And I sure as hell am not going to try to roll my own.
Please stop stirring up drama about this issue. While you are technically incorrect (PBKDF2-SHA1 is faster than and thus inferior to bcrypt), it's irrelevant: all three of [scrypt, bcrypt, PBKDF2] are just fine, and you can safely pick one at random.
If a database of bcrypted passwords from LNKD had been leaked, we'd be having a totally different conversation right now. (Same, of course, with scrypt etc.)
Am not a cryptographer by any means, so please correct me if I'm wrong:
If you use any reasonable cost for bcrypt, you're talking hundreds of milliseconds per attempt on a modern CPU. For each 6-character password (since you can't generate a rainbow table) at 100ms per pop, you're talking about something on the order of 2+ years per password divided by the number of CPUs. With something like 900 CPUs running continuously, you could expect to recover one 6-char every day if the passwords were randomly distributed in the 6-char alphanumeric space. So, pretty feasible, assuming a 100ms cost. Short passwords do hurt you; I agree.
Now for 8-char alphanumeric passwords, you'd have to run ~1 million CPUs continuously to expect to recover one per day at a 100ms-per-pop cost. This is more of a stretch, assuming you're trying to do this with, e.g., botnets. It seems that someone asking for help cracking a password list on a forum would probably not be able to assemble this much computing power.
Or 1 billion CPUs continuously to expect to recover one 10-char alphanumeric password per day.
Of course, the assumption of random alphanumerics is wrong, both because many people will use common passwords and because others will use non-alphanumeric character substitution.
At any rate, it seems to me that leaking non-salted SHA1 hashes is virtually the worst case disaster scenario, short of plaintext passwords.
But suppose tomorrow it takes 10ms. Also, tomorrow, available spaces will increase, so the likelihood of a space vs time tradeoff (even partial) increases
WEP was considered "good enough" at first (even though it had obvious problems at first like key size), WAP was considered unbreakable at first, today it's feasible with cloud computing or GPUs.
And then we'll be complaining on HN that they didn't use xyzcrypt or something instead of bcrypt.
" it seems to me that leaking non-salted SHA1 hashes is virtually the worst case disaster scenario"
The time bcrypt takes is configurable, so in the future you can adjust the amount of work per password -- this is literally a one-character change in your code -- and be alright again. Ditto for the rest of the decent password hashing schemes.
I think you are propagating the myth that a scheme can be secure forever.
It's ok if WAP is breakable with cloud computing, because the whole point was to secure it for the next X years so that it takes more than Y dollars to break it. You only need to protect million dollar data enough that it costs 10 million dollars to get it.
If the data is valuable enough and protected heavily enough with crypto, the cheapest way to get it is through a meatspace attack (break-in, abduction, etc).
> WEP was considered "good enough"
Not by security professionals once they saw the effective size of the key. It's the downgrading of what looked like a 64bit key into a 48bit key that was the biggest problem.
The math doesn't sound right. Google allows any ASCII character for their passwords, which is 95 chars. I calculate 2330 years to crack each password. Did I get something wrong?
(95^6 * .1sec per hash) / (60sec 60min 24hrs 365days)
The key difference is bcrypt does ~10 hash/sec. A GPU-enabled password cracking machine can do over 500 million hashes per second. That generates a rainbow table in ~30 minutes.
These hashes were posted on a forum as a plea for help: the guy did not have enough computational power to crack them all on his own. Had they been salted bcrypt hashes, it might have actually discouraged him to the point of not even trying.
So yeah, the weakest passwords will always fall, but good solutions will go to great length to protect even the most clueless of users.
I wonder, why do people saying "just use bcrypt" never, ever bother to elaborate on what benefits it has, and which of them are relevant to the subject of the conversation? Believing in some function without understanding implications of its use does very little for real security.
Bcrypt does not require your understanding. The most important thing is that you use a strong password hashing method -- of which bcrypt is the best-known, and an excellent choice. For a basic level of understanding, here's a slightly exasperated blog post that a lot of people link to:
It's not an in-depth answer. It does not say, for example, why bcrypt is more secure than nested SHA1. (I believe it has to do with the possibility to efficiently implement SHA algorithms in GPUs.)
People are using unsalted SHA1, because someone told them in the past "just use sha1". Now someone else tells them "just use BCrypt". Without understanding why, it's nearly impossible to to decide which security policy is sensible. There are many different types of advice competing for attention, and not all of them are good.
Somebody once said fire was composed of phlogistons. Later, different people said that fire was instead a process of decomposing fuel molecules and a release of visible light due to the energy of the chemical chain reactions taking place inside the flame.
The guy who said "phlogistons" was wrong. So was "just use SHA1" guy.
I wonder why people who make this complaint never ever bother to google: "why use bcrypt". It's like they somehow forget they have the best magical oracle to answer questions at their fingertips, which can answer the question better than most people who understand bcrypt could.
stef25, this is known as key stretching, as others have already explained elsewhere in this thread. Essentially the idea is to make computing the final hash of the password slower by iterating the hash function many times.
This additional slowdown is unlikely to be noticed by a user during an interactive login (hashing the password may take 1ms instead of 1us -- an imperceptible difference to a human) but it dramatically slows down the speed at which an attack can compute hashes to try and recover the password for a leaked hash. It also increases the amount of storage space required for (a naive implementation of) a rainbow table since the attacker would need to store the output for 1, 2, ..., n iterations of the hash function.
I'm not familiar with iterations, anybody care to clue me in? I would have thought salted sha-1 would be decent for password hashing, though not the most solid possible, but at least not laughable. Is that not the case?
It is not. Sha1 is designed to be fast. You want your password hash function to be slow, so that an attacker has to spend as much resources as possible to brute force it.
Of course, it does not mean you should take a slow implementation of a fast hash. You need a hash that, when implemented to be as fast as possible, still is pretty slow.
I've just downloaded the database linked and it only contains the hashed passwords, not the account usernames / e-mail addresses.
I wonder if someone has the account details to match up otherwise you've no idea which password belongs to who, and you'd hope that LinkedIn would have lockout functionality.
Keep in mind that whoever leaked the hashes is probably keeping the usernames / emails for themselves. The forum in question doesn't allow posting of user-identifiable information according to the forum guidelines.
The leaked hashes seems to be SHA-1. I've also confirmed that the hash of my own (semi-complex) LinkedIn password is in the list.
Accidentally this is the same password as I had for HN and that I've now changed (phew! THAT'd been bad! :-)
It would still take a moderate amount of time for a single password if it's long and complex -- you're essentially generating the rainbow table. You might as well just download a sha1 rainbow table and just perform a O(1) lookup. You could reverse all the 6.5M password hashes in mere seconds.
Actually, for a large enough list of unsalted password hashes, bruteforcing is faster that rainbow tables:
- a rainbow table may require a constant amount of time to reverse 1 hash, but it has to be repeated N times for N passwords.
- when bruteforcing, a password candidate can be checked against N hashes in a constant amount of time (look up the candidate hash in a hash table)
For example if it takes 10 minutes to look up a hash in a very large rainbow table (such as the A5/1 GSM tables published a few years ago), it would take 123 years to attempt to reverse these 6.5M hashes. On the other hand, millions of the leaked SHA1 hashes can be cracked in mere hours on a GPU with oclhashcat which tests billions of candidate hashes per second.
true, for extremely large rainbow tables. SHA1 tables are around 20-60GB depending on how large your base character set is. If you shoved all this data into a giant database, query speed is still under a few milliseconds. In general, rainbow tables can be sharded fairly easily, so if your data set is a few hundred terabytes, just split it across a few machines and you'll retain the millisecond query times. Storing and querying easily partitioned data will usually be faster than a brute force calculation.
Calculating it is like saying you want to find the fibonacci number for any given N, and you have a really fast processor to calculate it to that N, but if you just persisted pre-calculated values up to C, you'd only need to calculate N-C hashes. So even if you are bruteforcing the password, it is still faster to have rainbow tables up to a certain length.
What I say is true for any size of rainbow table. It seems you forget that RT lookups require CPU resources in addition to mere I/O resources. There is always a number of hashes beyond which brute forcing them is faster than RTs. Sometimes this number is very high (billions of hashes), sometimes it is lower (thousands of hashes). It depends on many factors: RT chain length, speed of the H() and R() functions, speed of the brute forcing implementation, etc.
To take your example of a small SHA1 rainbow table of 20GB, assuming it has a chain length of 40k, looking up a hash in it will require on average 200M calls to the SHA1 compression function (assuming a successful lookup). A modern CPU core can do about 5M calls per second. Therefore looking up one hash will take at least 40 sec, and looking up these 6.5M LinkedIn hashes would take 8.2 years! (This is just counting CPU time, I assume the RT is loaded in RAM for a negligible I/O access time to its data.) A RT of this size would cover a password space of about 2^44. For comparison a decent GPU can brute force this many hashes concurrently at a speed of roughly 500M per second (see oclhashcat perf numbers on an HD 7970). Covering the same password space would take only 9.8 hours. Compare 8.2 years vs. 9.8 hours: obviously the LinkedIn hashes that have been cracked so far have been brute forced, not looked up in RTs!
And even if you leveraged GPUs to perform RT lookups, they would speed up the computations by roughly a factor 100x, reducing the 8.2 years down to 30 days, still unable to match the short 9.8-hour brute forcing session. (My friend Bitweasil is doing research on GPU-accelerated rainbow tables, see cryptohaze.com)
As a more general question: why is it not an industry standard to salt with the username/email in addition to the random key? (i.e. Sha1($salt + $email + $password)). Even if the random salt were excluded, I would think that this is much more secure. Existing rainbow tables would not be anywhere near as helpful, and attempts to generate a rainbow table for a specific salted database would be ineffective because the salt changes on a per-user basis.
The solution is to use a better method of storing passwords. Hashes like SHA1 are designed to be really fast (great for hashing data but also great if you want to brute force).
Then the password has to be updated whenever your email changes. I believe Amazon does it like that, literally "forking" whenever you change password; at one point it was possible to simply log on with the old password and live an "alternate reality" where all changes you'd done after changing pwd had not been applied. Don't know if it's still the case today.
Why would you use the email? Mostly when passwords/usernames are stolen the email is there too. For my site I have an unique 128-bit token for every user. I also have a 128-bit site_key (which is in the application, not db) and mix those with the password and then hash.
The rar with ~100k cracked passwords in it. If you tried to find your own, perhaps you're one of the ~144 million accounts that wasn't published?
Edit: I'm not sure I understand what you mean - there was 100k passwords in one file, already cracked, and another with all 6.5M hashes. I found my hash in the hashes file.
Which should be done, but which doesn't help those users where it matters most; the real value of this database is that some people (~everyone) reuses passwords across sites.
You can perform this check even if they were salted.
Otherwise how could linkedin check if you correctly entered your password?
The salt is contained in cleartext as part of the hashed password, so that you can repeat the hashing the secret and match the two hashes.
The salt improves the security because:
1. even if two users use the same password, you cannot tell that by simply comparing the hashes
2. makes brute force checks much slower because you have to recompute the hash for every hashed password entry rather than once for every dictionary entry
To get a sense of it, I downloaded it from a link here. Below is the structure of the first few lines. Caveat: it's garbage/useless data below -- I intentionally changed around the actual numbers to give a sense of the structure, only:
The pattern 000000a9 is just in presentation - I counted the occurrences of different bytes in that position (also misled by the apparent pattern, where many lines in a row would have the same 4th byte), and each possible value is present more or less equally often.
It seems like it's just sha1.
EDIT: however, 3.5 million hashes start with 5 zeroes, which is way too many for just coincidence. Possibly they used multiple hash functions?
LinkedIn allows you to sign in using any of your verified email addresses, so it seems likely that the usernames are at least stored in a different table.
MD5 isn't the issue - it's the lack of salting. Without a salt, almost any hash can be cracked with a rainbow table. With a salt, you'd need to know the salt for each hash, and then generate a new rainbow table, in order to recover the original password.
This isn't really the issue. The real issue is that MD5 (though these hashes are SHA1, which has the same problem) are too easily computed; they are practically byte-forceable. I don't need a rainbow table to compute hashes when I can slam out millions in short order using a GPU. You have a good point about needing to know the salt, but getting the salt is generally easy because it's usually stored in the same place as the hashes (and this practice is fine, because hiding the salts doesn't improve security significantly on its own).
The difference is that if it's salted you need to work to get a specific password. Without salting you can test a generated hash (rainbow table) against all 6.9 million hashes at the same time.
Not defending the choice - bcrypt is obviously a much better way to go.
The thing is, though, that it's trivial to slam through that set of salted passwords. It's like unsecured Wi-Fi versus WEP: "door unlocked" versus "'No Trespassing' sign."
What prevents developers from adding a large DB-wide salt (in addition to normal salt) to every password? Wouldn't that prevent bruteforce attacks regardless of the hashing algorithm?
Random nonces have very little to do with what makes SHA1 insecure and bcrypt secure. Developers have a very weird and totally misplaced faith in the ability of random "salts" to secure passwords.
We're speaking about a very specific attack here: bruteforce. And I'm speaking about a very specific type of "salt" (which could probably be called something else, since it's not the same as normal unique-per-password salt): large, database-wide string of random bytes.
If every password is padded with such a string before hashing, computing the hash would be slower. Obviously, it would be slower because you would have to process more data. An interesting question is whether this would also make it less parallelizable by the virtue of having more information than would fit into GPU cache.
None of this makes much sense to me, sorry. Brute-force password cracking has worked on salted passwords since Alec Muffett released Crack in the early '90s. The amount of extra computational power required to hash a password and a salt is negligible.
The only thing "salts" do is prevent rainbow table precomputation, but it's just a quirk of the late '90s and early '00s that "rainbow tables" ever became a mainstream attack method: one bad Microsoft password hash and a series of bad web applications. Long before the MD4 LANMAN hash was ever released, people were breaking salted Unix passwords with off-the-shelf tools, on much, much slower computers than we have now.
Computing a hash on 1MB of data is slower than computing a hash of 6-8 bytes of data. Brute-force attacks are based on trying different passwords and seeing that after being salted they generate the same hash as in the database. Therefore, adding a large string to the password before hashing would force the attacker to hash that string. The question is, can this be pre-computed once or efficiently parallelized?
You're advocating creating a 1MB "salt" string to slow down hashes? That's the same as simply iterating your hash function enough times to invoke the block function repeatedly.
Just use bcrypt, scrypt, or PBKDF2. People have already figured this problem out.
First, I do not advocate anything here. I asked a question.
Second, working with a large string of bits is the same as recursive hashing only if you can pre-compute some small intermediate state of the hash function for that string independently from the password you're trying to guess. If you can't, you would have to work with the entire string for every new password tried.
1MB of data will have 16384 SHA 256 blocks. So that's roughly the slowdown I would expect, minus the time it takes to initialize the algorithm for a particular message.
That's not that interesting by itself, but it is interesting to think about how this would affect computing the hashes on GPUs.
128 bytes is not "large". I was thinking more along the lines of megabyte+. There is no question that it will slow down hash computations, because you would need to process more data. The question is, can you efficiently parallellize this in a commodity hardware (GPUs)?
To be clear, MD5 (or SHA1 as these apparently are) is a problem. Passwords should be stored using a cryptographic hash function that is designed to hash passwords (read: be slow), not a generic cryptographic hash function (which are designed to be fast). This is exactly the problem that bcrypt was created to solve (among others).
I think people are missing the point that SHA2 is light years ahead of MD5. MD5 has had known security flaws for years.
>Do not use the MD5 algorithm
Software developers, Certification Authorities, website owners, and users should avoid using the MD5 algorithm in any capacity.
The security differences between SHA2 and MD5 are irrelevant to the matter at hand. If they were MD5 hashes they'd be broken approximately as quickly and in exactly the same way.
Still, it doesn't matter. As long as one can generate a rainbow table for the hash function, then password lookups will be a O(1) operation. The rainbow table for md5 is moderately small, sha1 is bigger, and I'm sure sha2 is even bigger than the sha1 table.
Good Guy Startup Founder would cross reference this password list with their own password system and force those that match to reauthenticate and change their passwords.
This wouldn't be difficult to do and your users would appreciate it.
It's possible to test this when your user re-authenticates, assuming you're not using a challenge-response authentication mechanism (as sadly most sites do not).
That's easy to do it you have the email addresses, but impossible to do if you only have the SHA-1 hash, as in this case (unless you're also using unsalted SHA-1 hashes, which is a much bigger issue by itself).
You'd do it at login time. User enters user/pwd -> hash with unsalted sha-1, check if in list -> if yes, alert to change / if no, proceed with normal hashing.
Easy, just convert all the hashes into passwords using a rainbow table. Should only take a few seconds to convert all 6.5M passwords -- O(n) operation here. Then run all the passwords through each user's password algorithm, this is a O(n^2) operation. Essentially you're making 6.5M password attempts for each of your users. It could be slightly faster because I'm sure there are quite a few duplicates in 6.5M passwords.
What's wrong? They exist... they're bigger than md5 tables, but not significantly larger. If you don't have 50GB of free disk space, you could get a table with lower complexity for around 20GB or so.
A cross-reference is only feasible in very bad situations:
- no-salt or same-salt and same hashing
- trivial/common passwords (password1 etc)
- password(hashed/unhashed) and email are paired.
A cross-reference could be accomplished for all known cracked linkedin passwords, but this would be no different then you running a dictionary attack of known passwords against your own users... This seems very bad. Enforcing strong but sane password strength rules should mitigate this need.
Cross reference only has value if both the hash and email pairs are leaked.
The bitcoin leak fell into one of these very bad situations:
- [<email>, <hash>] where leaked together
- poor hashing (just sha1, no salt if memory serves)
- unfortunate number of people reuse passwords
The released passwords are hashed with SHA1. Assuming you use the same algorithm and linkedin does not use a salt (they probably do), then you could just compare the hashes.
LinkedIn passwords are not salted. You can only make comparisons if your database contains unsalted passwords. And if both databases used salted-passwords, then you still can't compare unless you all shared the same salting key.
You can't compare the hashes unless you have access to the clear passwords of your users. Unless you mean to do the comparison just as they log in. Seems like a lot of hassle for not much though.
you'd compare the hashes in your database with those from the file. The users with a hash contained in the file would be notified.
Because the passwords aren't salted(stupid), you might get multiple hits for the same hash(for example, for the good old "1234" password), meaning you might end up contacting more users than actually affected. Better safe than sorry.
i agree, but think about the backlash this would create amongst the userbase. the majority of the users will probably never even realize / read that their passwords have been stolen and thus linkedin probably does best in keeping a low profile about this (and start from now on using a better encryption). this is obviously not in the interest of the users, but it is in the interest of linkedin.
Or, they could take the Zappos route and just force everybody to reset their passwords. This route would make adopting a different (e.g. salted) password system quite straightforward.
I've found '1234678', 'password', 'qwerty', 'linkedin' and few other common phrases (already 00000'd, obviously), so it doesn't look like a list of just the hard ones.
Interesting, I tried this with a bunch of different passwords (though using php's sha1 function, which obviously gives the same output as ruby's), and found no matches. You're using the "combo_not.txt" file from the zip file in the ggp, right?
The dump is not complete -- my password is also missing. As other people said, that file contains about 6.5 million hashes, while LinkedIn has 30 times more users.
Considering how usernames weren't leaked, there's a big chance that the intruder is just sitting on them and the other passwords.
My password is missing too (if i've done right the hash generation as illustrated above). It's strange that only hashes starting with "000000a9" are present, someone said here that it's just presentation but my hashed password is 40char long as those leaked including the 000000a9
Either you don't have a complete file or you haven't scrolled through it. Only the first 277 hashes start with that string (and some others scattered throughout).
i was talkin about hashes starting with 0000 (i just looked at the beginning and the end of the file). jgrahamc posts is useful, if i dont consider this 0000 (that could be a sign of "ok we've decrypted it" i can find my hash (password was not very difficult)...
I wonder, what if this list wasn't leaked from LinkedIn databases, but rather from some third-party service using the "enter your password" anti-pattern? A flaky service like that would likely not be very good at safely storing passwords.
Unfortunately, LinkedIn keeping mum on the subject makes it easy to speculate that it was actually coming from them. Otherwise it'd be easy to deny (and even spin: "How dare you! We never store unsalted hashes, we follow state-of-the-art practices here!!"). Also, their security track record is... embarrassing as it is.
I wonder how many LinkedIn users use the same passwords for all their accounts. The article talks about identity theft and "confidential contacts" but I think the real danger is that people tend to use the same password everywhere. It's their other accounts that might have real value.
EDIT - As I think about it, e-mail accounts would be especially valuable as most of your other sites could be compromised using the "recover my password via e-mail" feature if the hacker could read the resulting mail.
Me. Admittedly, it's stupid as hell, but has generally been too much of a pain to do anything else (for things outside of banking, email). I've started to get serious about KeePass lately, but I bet a significant percentage of users take the lazy approach.
Having to type in my Apple password on iOS once every few hours inevitably means I have to use something memorizable and quick to type. There are certain trade-offs with different passwords.
I've developed a system (kept only in my head) where every password I use is based off on the name of the service. This means that with just one of my passwords, you're most likely not getting anywhere. With two, you have a bigger chance of figuring out the differences and thus the system, but it works fine for me at the moment.
Don't underestimate me. It contains many numbers extracted from the letters according to various rules (order in alphabet, backwards, etc), along with special characters.
I take things a step further -- I have no idea what my password is on sites like HN or reddit. If the cookie is ever gone, my account is gone.
I don't like the idea of identity permanence.
Instead of shitty passwords though, why not use something like 1Password to store the logins? I use that (or an old fashioned piece of paper in a secure location) for meaningful security tokens.
Ha. I'm in the same boat. This is my second account after the first one got ghost banned (for a single comment and the followups attempting to explain).
I generally use the same password for what I feel are non-critical sites like LinkedIn, twitter and Facebook. Another password for testing new services/apps etc. As a rule any site that may contain my credit card data or sensitive information I use a separate password. I feel this is the best compromise to having complex passwords for each account.
I used this in the past as well. But then started thinking about what non-critical is. As a "internet professional", even my Facebook account being compromised would be negative impact on my image; on LinkedIN doubly so due to it's professional character. So I basically decided that I'm not going to distinguish at all (sliding slope) and just have randomly generated passwords for all sites (not for my Mac though, too much hassle/attack vectors are different).
Safe >> Sorry
EDIT: Just checked, and my randomly generated password is in the leaked list of hashed passwords. I'm not using that same password anywhere else, so the source MUST be LinkedIN through whatever means (or it's some Mac/PC based attack vector, and these folks only leaked LinkedIN accounts which sounds very implausible).
Whatever manager it was that tasked some junior programmer (particularly one that didn't know that unsalted SHA1 is a terrible idea) with implementing the password system at LinkedIn needs to be fired. Making the programming mistake means that you don't know much about web security, and while not a great thing, that's forgivable; putting someone that's utterly unqualified for code with security implications on such an important task is not. Nor is letting the code get deployed without having someone that knows what to look for review it. Nor is letting such a bad decision remain live for...what is it now, almost 10 years?
But let's not stop there. There are probably a dozen other people at the company whose job it is to avoid blunders like this, all the way up to the top technical staff. After all, LinkedIn is not, and has not been for some time now, some tiny underfunded startup. It's a goddamn public company, and even before that it was a super-team Silicon Valley darling that was getting money thrown at it since even before tech became cool to invest in again, and it's been valued at over a billion dollars for almost five years now. There is absolutely no excuse for this, they should have been doing regular security audits for years, and no audit worth its salt would miss something this simple. I absolutely refuse to believe that this problem was unknown, that nobody ever commented or filed a bug report about this code - no, this was deprioritized, because it wasn't considered a high enough value problem. And now it's bitten them in the ass and become a problem, probably because some other security vulnerability was similarly deprioritized instead of fixed.
I expect this from some shady Bitcoin market that a high school kid runs off of a server in his bedroom. I do not expect this type of amateurism from a 10 billion dollar company with hundreds of engineers, many of whom have specifically looked over that code, some of whom have probably complained about it, and all of whom should know better than to let it fester...
"I expect this from some shady Bitcoin market that a high school kid runs off of a server in his bedroom. I do not expect this type of amateurism from a 10 billion dollar company with hundreds of engineers.."
Think you might be expecting too much from large companies =/
Cracking the passwords from the hashes is not just fast, it's ridiculously fast. I can't believe a site like LinkedIn stores their passwords this way in 2012.
That's plain old john the ripper running on the cheapest 13" 2010 mbp. John is not even using the GPU, and non-trivial 8-character passwords are scrolling by in my terminal, too fast to read.
What riddles me though, is how come 6.5 million?
LinkedIn has what, 150M users?
Did they not post the entire load (and are in fact sitting on _all_ the hashes?)
Is the dump an old backup or breach from when they had fewer accounts?
Is it just one DB partition / file that's been lost, an archive?
Given that these hashes are not salted, running a 'uniq' on the list of all users' password hashes would probably already cut it by half, if not more. Then you eliminate all the easy ones from wordlists, and post the remains on the internet for people with excess computing power to bruteforce.
I assume the first line you meant to pipe it through uniq afer the sort? Otherwise the only thing you've demonstrated is that sorting a file doesn't change its line count. :)
I can confirm that my long-lived randomly generated single-use 12-character password hash is in the file, but not 00000-prefixed (apparently not broken).
A more recent 20 character single-use randomly generated password was not, but the file doesn't comprise the full 6.5 million hashes noted in stories.
I've since changed both for rather longer randomly generated single-use passwords.
For anyone trying: it's not a direct link, but a download page (JS required) which lets you d/l "combo_not.zip". Which has 6458020 lines of "00000"-prefixed hashes, apparently sorted.
My old password was in the password file, and it was flagged as cracked.
If you're a Windows user and you want to check if your password is in the file.
(1) download the passwords file from http://www.mediafire.com/?n307hutksjstow3
(2) the download is a RAR file, so you'll need to have WinRAR installed to extract it.
(3) to get the sha1 version of your password, go to duckduckgo.com and type:
sha1 yourpassword
(4) copy the result, except for the first 6 or so characters
(5) open a DOS command prompt (WindowsKey+R and type CMD)
(6) type (quotes required where indicated): find "sha1hash" sha1.txt
(note: to paste to the command prompt is right-click)
Example:
The sha1 hash of the password 'password' is: 5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8
Remove first six characters: e4c9b93f3f0682250b6cf8331b7ee68fd8
enter at command prompt: find "e4c9b93f3f0682250b6cf8331b7ee68fd8" sha1.txt
result:
---------- SHA1.TXT
000001e4c9b93f3f0682250b6cf8331b7ee68fd8
I can confirm that my password was in there. I have changed it. My password was "98mnja6z" which hashes to 6475590bc1407aa98c8b022230292cce3d8528b3. I used this for no other sites, so I'm not concerned about it leaking.
It is inexcusable that LinkedIn hasn't alerted their users yet.
I'm starting to think it might be wise, if you intend to reuse your password on multiple sites, to salt it yourself. By using a form like "<site name><user name><reused password", you protect yourself from rainbow tables without making your username harder to remember.
And yes, yes, I know you shouldn't be reusing your password across different sites, or using a dictionary word anyway. And teenagers also shouldn't be drinking, doing drugs and having sex. It doesn't help anything to pretend that people are going to behave optimally.
Of course, the preposterous restrictions that websites put on passwords, like maximum password length, will make this idea harder to put into practice.
I've been doing this myself and it has worked out pretty well so far. My password is in the list of passwords released, but is uncracked and I can rest assured knowing that I did not use the same password on any other website.
A couple things to keep in mind:
1) The salt you generate should be put at the front in case the website is silently truncating the password to a certain length
2) The salt can be something more complicated than site name. I mentally calculate a fixed length salt based on the site name
3) You may want to still keep two separate "base" passwords, one for high value sites (banks, email) and one for low value sites (everything else).
This makes me wonder. I've been relying on Django's built in user authentication lately. Does anyone know if that's pretty safe? Is it doing the right thing for hashing passwords?
I sincerely mean no offense but this statement came directly out of your butt. Read the table on page 14 of Colin Percival's Usenix paper "Stronger Key Derivation Via Sequential Memory-Hard Functions" (which you could have found by Googling [scrypt paper]); PBKDF2 is ~5x faster (ie: costs ~5x less to break) than bcrypt; PBKDF2 and scrypt aren't even in the same ballpark.
From exactly where did you derive the idea that PBKDF2 is "extremely good"?
The reality is that all three of PBKDF2, bcrypt, and scrypt are just fine. But PBKDF2 and scrypt have drastically poorer library support than bcrypt; nobody should delay using a strong password hash so that they can optimize which one they use.
All three are extremely good for this use case, when the competition is SHA-1. Beyond that, I don't know enough to compare the three. So yeah, it came out of my butt.
If Colin has a paper on it then I trust his comparison. What I really meant to say is what you said: all three are just fine.
Also, I thought I remembered my comment's parent saying something stronger, either it was edited later, or I was drunk when I decided it was worth commenting on.
Eh? PBKDF2 has configurable complexity and has found many more applications than bycrpt, from WPA2 to disk encryption. The crypto research behind PBKDF2 is much more rigorous.
Please cite one academic cryptography paper that presents an analysis of PBKDF2, other than Colin's paper which damns it.
There is virtually no "rigorous" research into KDFs of any sort, let alone password KDFs. Most academic crypto research simply presumes passwords are taken from cryptographically secure random number generators and stored securely.
And with that said I want to remind you that I just cited a source, accepted at Usenix, that measured PBKDF2, bcrypt, and scrypt and found PBKDF2 inferior to bcrypt. You seem to want to pretend otherwise.
Django has chosen a fine default and for the next several years it's probably unnecessary to second-guess it. Over time, GPU and (more importantly) FPGA-assisted hash cracking may or may not become more common, at which point you'd want to transition to something like scrypt.
You could literally flip a coin to decide between bcrypt and PBKDF2 and it wouldn't matter which side came up.
I'm not an authority on this, but django_bcrypt is generally considered a best-practice in the Django community. Scrypt may replace that in the future, once implementations are widely available and battle-tested.
Is it unique enough that you can be sure it's your password, and not someone else's? I ask because the cracked passwords seem to be the simple/obvious ones that are likely to be used by multiple people. If it is strong/unique though, it would effectively confine the hack time to the last 2 days.
Obviously the list was filtered to eliminate duplicates. It contains only what the hackers wanted it to contain. So, why does nobody mentions that it is HIGHLY LIKELY that the user names associated with the passwords (which are actually mainly e-mail addresses for LinkedIn) are also in the possession of the hackers.
So, if I would be the hacker - strip usernames, strip duplicate hashes, post list of unique hashes to let others do the CPU intensive cracking, retrieve cracked passwords, match with usernames (e-mail address), check same password on other accounts (first on the e-mail account, then google the e-mail address on forums or try on the services that interests me and say "forgot password, send it again to this e-mail address - thank you telling me that this e-mail has indeed an account with you..."), monetize somehow the data.
As a user that implies - IMMEDIATELY change your password for the e-mail address used to login at LinkedIn (if it was the same password); verify if settings of this e-mail account have changed (like an additional unknown address added to allow retrieval of the password, DUH),
try to remember where you use the same address either as login or to recover credentials, try to remember where you used the same password, google you e-mail address to help you remember; change passwords; consider abandoning the e-mail address if it is not your primary one,...
Also - did the amount of SPAM that you receive on the e-mail address used to login at LinkedIn suddenly increased, while SPAM remained constant on a similar mail account not connected to LinkedIn ? Maybe someone just sold your e-mail address, so the LinkedIn break may affect you even if the password is not in the list.
Bottom line is - LinkedIn approach appears to be: We have no proof that this particular account was hacked since password hash is not in the list - let's not overreact and let'sassume it is not hacked even if we don't have a clue what was actually hacked. I'm not to judge if it is the best approach for the business, but sure as hell I don't like this approach as a user.
I didn't see it in the post, but does anyone know if these were current passwords (as of this post)? I use a unique password for linked-in, but some number of months ago I used a password I shared with another site. Wondering if I need to change that one too. Guess I might as well.
First rule of software design: users are lazy.
Second rule of software design: users are stupid.
"Use your own lock" is fine for us Übergeeks, but for the vast majority of the populace, they just want the provider to put a system in place so they don't have to worry about it.
I know a lot of companies just keep your account including your password in there database while you removed your account.
Can I be sure my account was totally removed when I removed my LinkedIn account? Because the "please change your password as soon as possible" won't help me much.
Can we please start using BrowserID or some other standard so we can secure that one provider and do away with all this? I'd like it if we could authenticate with Google using 2-factor authentication and be less worried about my password getting hacked.
By centralizing authentication, you make that central provider an even bigger target and you risk losing access to other services as you lose your main account (Google is known to sometimes terminate accounts with no way of recurse).
Finally, when that central provider gets hacked, all your dependent services are now also compromised.
And as we know from the CloudFlare story over the weekend, not even Google with their 2 factor authentication is devoid of issues.
No. Centralizing your login to one third-party as as bad as the current practice of reusing your password for every service you have an account with. The only way that is reasonably safe is to use different random credentials for every service and store these credentials somewhere under your (and only your) control (i.e. a password manager or a piece of paper)
Browserid is not a centralized authentication protocol. Although currently all implementations I know of rely on browserid.org, this is not required by its design.
There's also the fully decentralized openid, you know. I'd 100% rather be able to use openid for sites like Linkedin and this one than rely on every site implementing sane password management.
There is no reason why we should centralize password management and put the world's authentication into one giant pinata for black hats to take a swing at.
A single point of failure sounds dangerous.
People should just avoid using the same password for different websites. (That's what KeePass is for..) Perhaps a clever extension / browser feature could ensure that. (e.g. "Warning: You are probably using the same password for facebook.com")
Wow. Not only is every single reply to StavrosK completely wrong about how BrowserID works, they're actually doubly wrong. Not only is it NOT centralized, it also can be used with:
- 2 factor auth
- asymmetric encryption (aka, a challenge/response ala PGP)
- whatever security mechanism you want, frankly. It's up to the browserid provider.
My rationale is that it's much easier to secure one provider (the attack surface is much smaller), and you can also run one yourself, making you responsible for all your authentication needs.
OpenID was great in that you could choose any provider you wanted, and nobody could attack them all (not that they'd have to). It just seems like a good solution to use someone whose only job is to provide secure authentication.
It seems we will never get rid of bad programming like this. I hit the 'forgot my password' link on the T-Mobile website yesterday and the pop-up requested my T-Mobile phone number. Ten seconds later I received an SMS with my actual password in it.
Guys, this all doesn't parse for me. My password on LinkedIn was 13 characters long, and included symbols (!@#$%^&&*()), numbers, and alphabet characters. A 13-character password like this would imply a search space of (26 + 26 + 10 + 20) ^ 13 = BIG. If a GPU can check 11 billion passwords per second, this implies that someone ran 2.4 x 10^7 GPUs for a month.
We're either looking at someone with a seriously ridiculous password cracking computer (i.e. ASIC-based -- not even FPGAs), a compromise for SHA-1 (very unlikely), or a keylogger/proxy/trojan/etc... I vote for keylogger.
If your password is in this database, I don't think it's because your password was brute-forced.
My old password is not on the list. However, it seems like somebody tried to log on to windows live with the e-mail address and password I was registered on linkedin with. This is one of my oldest passwords from when I still only had one or two passwords.
I noticed this as window live kept sending another of my e-mail accounts a code needed to log in from an unrecognised computer.
Now it could all be a coincidence, but I wouldn't be surprised if there was a connection, as the e-mail address and the password were identical to the ones used on Linkedin. If that's the case there would be a more complete list with my password/hash as well as the associated e-mail address.
How on earth were they not salting? There are so many open source auth systems now that get all the basics right. Someone who works at a big company like this and has any insight, please comment. How is this even possible in these days?
There's still an unbelievable amount of ignorance out there about how to properly store hashed passwords. There are countless articles explaining that you need to hash the passwords, and telling you how to use md5("salt" + password), and then the blog comments are full of helpful people saying that you should use SHA256, or "no u also gots to add pepper", or exhorting the author to use a large unique salt from /dev/random (not /dev/urandom, it's not random enough) and then encrypt the salts in the database with 2048-bit RSA. I sometimes google around for these articles when I want some morbid fascination -- it's the intellectual equivalent of those YouTube videos where one car crashes into another, and then a third car crashes into the wreckage, and then another car tries to ramp over it and fails, and then everything explodes, and then the people staggering out of the destroyed cars start shouting bad advice about hash functions.
Putting people's personal details on the open web, giving anyone access, including malicious hackers... This design used by LinkedIn, as well as Facebook, was a bad idea from the beginning. Don't think they are not aware of the risks. How much spam and other annoyances do people get as a result? These companies are killing privacy just to make a quick buck. Maybe they'll be sued.
Direct link to SHA1 file on mediafire (117MB) to avoid javascript, captchas, popups, etc.
No sign of my password in there http://www.mediafire.com/?n307hutksjstow3, or my wife's. I checked both the full and the '00000' truncated hash for each. Neither of us had changed it for the last couple of years.
So I guess it is only a subset of all the linkedin passwords?
I have now changed my passwords anyway.
By the way, the press say both the username and password were hacked, has anyone seen the list of usernames? They also say 6.4m passwords were hacked but this file only has 6.14m.
A salt may not have been enough to protect the passwords : if it is not complex enough, the presence of common passwords like "password" or "123456" make a brute-force attack on the salt itself possible in some case. I have performed a benchmark on that point in particular, and was able to retrieve a salt in five days, without strong optimization. A bit long to give all the numbers and code here, so the ref is http://gouigoux.com/blog/?p=46
I cross-referenced the leaked hashes against hashes of the 10,000 most common passwords and found that 93% of the passwords at least 6 characters long appear in the leak.
It's always surprising that people are so lackadaisical about their passwords. I've had people tell me their passwords in casual conversation multiple times, just for the sake of discussion.
According to jgrahamc's investigation, this will probably check if your password is there and is cracked already. To check if the hash is there, although uncracked yet, you should probably remove the sed call from pipeline.
The good thing is: every time this happens to a high-profile site, storing sensitive data, more people get more acquainted with the concepts of "you really should not use a simple password" and "you really should not use the same password across all sites". I know it works for me: this was the last straw that forced me to abandon a good ol' password I've been using since 1998. From now on I'll just rely on password managers (currently DataVault, but I know people who swear by LastPass).
My password isn't in the file, and yes I checked for a 0'd version as well. My password is 9 characters of lower case, upper case, numbers, and a symbol. I'm wondering if this is incomplete, or fake. Either way...if it is a vulnerability I suppose LinkedIn hasn't fixed it yet, or at least I haven't heard mention of this - thus even changing your password won't help much if they can just re-download the database. Thus making a long, complex password is the best course of action.
My password hash was in the file and it was cracked. It was a combination of 8 upper and lower case letters, digits and special characters. This is the case where size does matter and apparently passwords like my old one can be broken on GPU in minutes or hours nowadays.
Quick sample from persons I polled: 2 password hashes were not in the file, 1 was there and cracked, 1 was there and not cracked yet.
As bad as it is, this can be a great case to raise the awareness of good password management.
My belief is that the hackers might get the username password combos, but they grouped the hashes to only have unique (sort -u ?) passwords hashes and therefore ease the process of dictionary cracking them as they do not have salts.
The 00000 prefix might be an indication of this. I bet there is an automate script taking care of a dict attack and the file was released during execution.
I deleted my account over 6 months ago but my password hash (strong unique password) is in the file. Either (a) the file retains passwords of deleted accounts, (b) the file was stolen over 6 months ago and LinkedIn didn't know about it, or (c) the file was stolen over 6 months ago and LinkedIn DID know about it and were hoping it wouldn't show up online.
I've come to the conclusion that this list is genuine. While some people have said that they could't find their passwords in the list, I think this only points to the most probable reason in that this is a part list.
How does this benefit someone who is trying to access an account? There are no account names tied to these hashes. So even if you managed to find the clear text of each of these you would still be in a position where you have a list of over 6,000,000 passwords to work through in order to brute force your way in.
Has any legitimate sources confirmed that the usernames were also stolen along with these hashes? Or were only the hashes stolen?
Could this just be an elaborate hoax where someone generated 6.5M SHA1 hashes and said that they hacked linkedin? Maybe someone shorted LNKD and then leaked this, hoping for a monetary gain?
http://pastebin.com/JmtNxcnB - 20k++ sample cracked passwords from LinkedIn hash dump released on June 6, 2012. They do appear legit and strong too. It's unfortunate that LinkedIn hashed them using unsalted SHA-1.
As much as I hate lawsuits, I'd love to see one or two major Internet companies sued in a class action lawsuit for negligence to serve as an example and a warning to the rest. This kind of behavior from a top tier internet presence is inexcusable!
A password I used many months ago (maybe almost a year now?) was in the list, but the password I use currently for many months was not on the list interestingly enough. This list is possibly pretty old, which means it happened quite a while ago.
I'm not an expert in the field but from what I know, SHA1 is a one way function. When an encrypted password is cracked, YES, the hackers know that specific password. They brute forced it by guessing the password, running it through SHA1, and comparing the output to the hash. If they are the same, then they guessed the right password.
They do not know any other passwords and if "salt" was used, they would have to brute force each password. I think salt wasn't used in this case so once they crack someone's password, they know every other user who used the same password. So if you and I used the same password, and they brute forced yours already, they will know that I have the same password.
"Cracking" in this sense is brute forcing. SHA1 is fast, and people use bad passwords. The combination means that you can run through lots and lots of bad passwords very quickly. I checked my linked in password I have stored in 1password, and it is 20+ chars with special characters and numbers. That won't be "cracked" in any meaningful sense, so I don't even worry about it.
You are correct that there's currently no way to go from a hash to a value that hashes to it in SHA1 (AFAIK, IANYNSA [I am not your NSA]).
I just looked into the file (combo_not.txt). There are only hashes. Who decided that the hashes posted in the forum are related to linkedin in any way ? Thank you.
Those paranoid tinfoil-hat wearing lunatics that generate absurdly long unique random passwords for every site are wringing their hands with glee because they found the hash of their LinkedIn password in the file. You're welcome.
My LinkedIn's password hash (at the time, I changed it once news broke) was not listed. And it was a relatively weak password (8 characters, just lower case characters and numbers). I doubt this is LinkedIn's password dump.
Even after securing our own passwords, we are all still vulnerable to attacks where the attackers simulate members of our networks to discover private information like our connections, job history, etc.
I checked a few 6 char passwords (alphanum) some were not present. So they seem not to be bruteforcing them serially. Maybe just checking against other known tables.
The worst is I can't remember what password I used but don't want to change it because I want to know if it's one I used somewhere else not just reset it.
Assuming LinkedIn used SHA1 unsalted passwords and will continue to do so, and many of us do not want to delete our LinkedIn accounts, what should be the minimum number of characters we should use in our new password? 15? 20? 100? (I know, 100 is probably higher than they allow)
On my own accord and not my employers, I'd like to invite developers to check out mojoLive as your career management tool. Our goals and vision are light years ahead of what LinkedIn has slowly become. Also, I dislike recruiters and spam.
When Twitter recently had accounts and passwords leaked, many were attached to spam accounts or duplicate records. Most had obvious passwords (like 1234).
Are these legitimate active accounts? Can you do anything with the hashed passwords alone?
In fairness to Twitter, it was never actually known if the accounts/passwords came from Twitter.com (proper) or (more likely) leaked from some 3rd-party Twitter-integrating app that had pre-OAuth integration.
I just changed my password. To test, I entered only letters and numbers all in lowercase. Linkedin accepts it even though the site says "should have upper case etc.".
0. This is a file of SHA1 hashes of short strings (i.e. passwords).
1. There are 3,521,180 hashes that begin with 00000. I believe that these represent hashes that the hackers have already broken and they have marked them with 00000 to indicate that fact.
Evidence for this is that the SHA1 hash of 'password' does not appear in the list, but the same hash with the first five characters set to 0 is.
Same story for 'secret': And for 'linkedin': 2. There are 2,936,840 hashes that do not start with 00000 that can be attacked with JtR.3. The implication of #1 is that if checking for your password and you have a simple password then you need to check for the truncated hash.
4. This may well actually be from LinkedIn. Using the partial hashes (above) I find the hashes for passwords linkedin, LinkedIn, L1nked1n, l1nked1n, L1nk3d1n, l1nk3d1n, linkedinsecret, linkedinpassword, ...
5. The file does not contain duplicates. LinkedIn claims a user base of 161m. This file contains 6.4m unique password hashes. That's 25 users per hash. Given the large amount of password reuse and poor password choices it is not improbable that this is the complete password file. Evidence against that thesis is that password of one person that I've asked is not in the list.