> Also you can work around that by hashing the entire uses password (with say SHA-512) prior to inputting into bcrypt.
I wonder why people are so eager to combine different hash algorithms, especially a strong with a weaker one. If this is isn't a well-established anti-pattern, it should become one.[1] Why? Because in some sense it combines the weaknesses of both algorithms.
Assume your password hash is:
h(p) = bcrypt(sha512(p))
Assume in the future somebody generates a sha512() collision with p and q, while bcrypt() holds water:
sha512(p) = sha512(q)
It follows that:
bcrypt(sha512(p)) = bcrypt(sha512(q))
and hence:
h(p) = h(q)
So any collision in sha512() translates directly to h(), bcrypt's strength is unable to protect you from that.
Had you just used bcrypt(), you would not have been affected.
Frankly, this is about collision and not about preimage attacks (the latter being more relevant for password cracking), but the former is usually a first step towards the latter. Also, this example demonstrates that combining a weaker and a stronger hash algorithms usually does not combine their strengths, but their weaknesses.
[1] It is almost like inventing your own crypto primitives, which actually is a well-established anti-pattern.
While the collisions themselves would increase it's not statistically significant to cause an issue in practice. Plus the idea here isn't to increase the strength of the overall construct. It's to ensure that all characters that the user entered have some contribution to the final product.
What I'd consider a much worse issue is considering the following to be the same by silently truncating things:
- some really long password ... that ends with foo
- some really long password ... that ends with bar
- some really long password ... that ends with baz
The only acceptable alternatives when using something like bcrypt are:
- Restrict user passwords to 72 bytes (not chars!)
- Hash with something like SHA-512 prior to passing them to bcrypt.
People are eager to combine different hash algorithms frequently because they desire to hedge against weaknesses in one of the algorithms. Or, in this case, because they want to shore up an existing 'weakness' in one of the algorithms (for bcrypt, its maximum input size).
You are correct that doing this ad-hoc should be an 'anti-pattern'. There are subtle details (see the papers linked above...). However, the idea itself is sound, if handled properly.
One thing to note is that if you manage to get access to just the hashes, they won't have sha512(p), so they wouldn't be able to figure out sha512(q). So while your are correct that there is (technically) a weakness there, it's not one that's actually exploitable.
Can you explain how a collision helps? I mean, there's trivial collisions with the truncation that would be used instead. That doesn't mean that bcrypt(f(x)) is any weaker because there may be some other x' that f(x) = f(x').
I wonder why people are so eager to combine different hash algorithms, especially a strong with a weaker one. If this is isn't a well-established anti-pattern, it should become one.[1] Why? Because in some sense it combines the weaknesses of both algorithms.
Assume your password hash is:
h(p) = bcrypt(sha512(p))
Assume in the future somebody generates a sha512() collision with p and q, while bcrypt() holds water:
sha512(p) = sha512(q)
It follows that:
bcrypt(sha512(p)) = bcrypt(sha512(q))
and hence:
h(p) = h(q)
So any collision in sha512() translates directly to h(), bcrypt's strength is unable to protect you from that.
Had you just used bcrypt(), you would not have been affected.
Frankly, this is about collision and not about preimage attacks (the latter being more relevant for password cracking), but the former is usually a first step towards the latter. Also, this example demonstrates that combining a weaker and a stronger hash algorithms usually does not combine their strengths, but their weaknesses.
[1] It is almost like inventing your own crypto primitives, which actually is a well-established anti-pattern.