Can collision attacks provide the same size of data? I suspect it would be drama...

chias · on May 15, 2019

I included this comment in a response to a child comment, but reproducing it here to make sure you see it:

If your output includes the size of the input, then what you've got is no longer a hash function. A hash function is defined as taking an arbitrary input and returning an n-bit output for some fixed value of n. By including the size of the input in your output, the size of your output grows logarithmically with the size of your input. This may seem pedantic, but fixed-size-output and arbitrary-size-input is extremely important for general usage of a hash function.

justincormack · on May 15, 2019

Sha1 and sha256 already hash the message length into the hash function and it is limited to 2^64 bits so they do not support arbitrary input size.

orblivion · on May 14, 2019

For what it's worth, Shattered specifically altered a small part of an existing file. Perhaps with chosen-prefix the story is different.

https://shattered.io (check out the two pdfs)

orblivion · on May 15, 2019

I forgot to specifically state - I think the two pdfs are the same size.

jwilk · on May 15, 2019

(This is chosen-prefix attack; there is no "the original".)

In all the attacks on MD5 or SHA-1 I'm aware of, the generated colliding strings had the same length.

orblivion · on May 16, 2019

So Shattered was "give me two different documents with the same sha1, I don't care what they are". I guess according to Wikipedia that's a "classical" collision.

Then there's "here's two different documents. Append to both of them until the results have the same hash". This is "chosen prefix" collision.

My question is, is there a name for an attack that says "here's a document. give me another document with the same hash."? (in which case, "the original" would have meaning).

jwilk · on May 20, 2019

It's called "second preimage attack".

jlgaddis · on May 14, 2019

Or you could just switch hash algorithms to something stronger.

a1a · on May 14, 2019

Yes, if I interpret your suggestion correctly. How would you know that the attacker have not manipulated the size parameter?

That's the best case. Worst case you end up with a memory vulernability (see heartbleed https://xkcd.com/1354/)

chias · on May 15, 2019

> How would you know that the attacker have not manipulated the size parameter?

This scenario doesn't make a lot of sense. Say I have a goodfile that is 512 bytes long and hashes to 3d8850e1, and someone else wants to produce badfile and convince you that it's my goodfile. GP's suggestion is that I publish a size-plus-hash value "512-3d8850e1" for you to check against. If the attacker is in a position to alter the size part, they're also in a position to alter the hash part, in which case why even bother with a collision? They can just change the hash to be whatever badfile hashes to.

The true answer to GP is that if you do this, it's no longer a hash function. A hash function is defined as taking an arbitrary input and returning an n-bit output for some fixed value of n. By including the size of the input in your output, the size of your output grows logarithmically with the size of your input. This may seem pedantic, but fixed-size-output and arbitrary-size-input is extremely important for general usage of a hash function.