You can increase the number of rounds of hashing as well, without rewriting your code.
I can't argue with your last point, simply because I don't understand it. How exactly does this "turn something that has a low number of collisions into a high number of collisions?"
In my mind, what we're doing is hashing "mypasswordmysalt" n times, and storing the resulting hash, the salt and n in a user table. If the user table is leaked, and n is 3 (for simplicity's sake, it would normally be _much_ higher), can you explain how this could be worse than doing one round of hashing?
Every hash function has a probability of collisions. For illustration, let's imagine a really bad one where every output has another input that results in the same value.
With a single round of hashing, there are two possible inputs A1 and A2 that can produce the final output O. With a sufficiently large number of potential inputs, it will take you a while to brute force and enumerate all possible inputs before you hit on either A1 or A2.
With two rounds of hashing, there are two possible inputs A1 and A2 that can produce the final output O, and two possible inputs B1 and B2 that can produce the intermediate hash A1, and two possible inputs B3 and B4 that can produce the intermediate hash A2.
With three rounds of hashing you end up with something like this:
So with each round of hashing you are increasing the number of collisions, meaning you're likely to brute force an input that will hash to O much quicker.
[Edit]
Of course, with each round you're also increasing the amount of time to compute O, but given most hashing algorithms are designed to be fast I'd say it's probably not enough to counter it. Not sure though, I've not actually looked at the maths.
No, there is an infinite number of inputs that produce the final output O.
Only with an infinite number of inputs. If we restrict our domain to things that are likely to be passwords, say string under 1000 characters in length, then we're increasing the number of inputs in our domain that can produce O.
you have to find something that generates O after exactly n rounds of hashing
I was using an increasing number of iterations to demonstrate how each iteration potentially increases the number of collisions. Taking n=3, you don't need to know any of the intermediate states A1, B1, B2, B3 or B4 to take advantage of the fact that in our domain of strings under 1000 characters we only have to find one of 8 possible inputs rather than one of 2.
I said will be true for all relevant hashing functions.
All relevant hashing functions have a probability of collisions in any useful input domain. Okay the tree won't be as dense as the one illustrated, but you're still increasing that probability by repeated hashing.
You need to find some way to mitigate the collisions, at which point you've basically got bcrypt.
Thank you, now I actually do see your point. I would still not think of it a considerable weakness. To find such a collision would take more time than bruteforcing any likely password.
There doesn't yet exist a single example of any SHA1 or SHA2 collision, and if we use SHA256 as an example, we could probably not find one the next few years by bruteforcing even if we used all the world's current computing power and storage.
Edit: Actually, that whole argument falls to pieces, because if we can search through enough possibilities to find any collision, the output size of the hashing function is too small to for the hashing function to be secure.
Indeed. This is where I agree with to that the article is a bit weak. It overstates the problem of repeated hashing and doesn't explain how bcrypt solves that problem at all. It makes it sound like a completely different and magical solution rather than repeated hashing with collision mitigation.
It's more a case of "hey, here's a potential problem you might not have thought of, here's an algorithm that addresses it."
The only advantage I know of with bcrypt over multple SHA2 is that GPUs are very bad at it compared to most hashing functions, so the CPU cost (on my server) and the GPU cost (the crackers' cost) are not too different. (Anyone, please correct me if I'm wrong.)
Off-topic: This exponential reply delay is really annoying.
"With a single round of hashing, there are two possible inputs A1 and A2 that can produce the final output O."
No, there is an infinite number of inputs that produce the final output O. And you have to find something that produces O after exactly n rounds of hashing, it doesn't help to find something that produces O after one or two rounds.
Edit: Sorry, didn't see your assumption when I first posted, but I guess what I said will be true for all relevant hashing functions.
I can't argue with your last point, simply because I don't understand it. How exactly does this "turn something that has a low number of collisions into a high number of collisions?"
In my mind, what we're doing is hashing "mypasswordmysalt" n times, and storing the resulting hash, the salt and n in a user table. If the user table is leaked, and n is 3 (for simplicity's sake, it would normally be _much_ higher), can you explain how this could be worse than doing one round of hashing?