Can collision attacks provide the same size of data? I suspect it would be dramatically more difficult to produce a collision of equal data-size as the original.
So perhaps, the easiest way to defend against a collision attack is to transmit the size of the data alongside the checksum. Like a checksum, it is extremely lightweight and easy to check.
In my data framework project (where I use sha1 to identify blocks), I've been looking for an excuse to do this, because knowing size of a block is incredibly useful for applying performance heuristics. I suspect it is also a significant defense against collision attacks. Am I wrong?
I included this comment in a response to a child comment, but reproducing it here to make sure you see it:
If your output includes the size of the input, then what you've got is no longer a hash function. A hash function is defined as taking an arbitrary input and returning an n-bit output for some fixed value of n. By including the size of the input in your output, the size of your output grows logarithmically with the size of your input. This may seem pedantic, but fixed-size-output and arbitrary-size-input is extremely important for general usage of a hash function.
So Shattered was "give me two different documents with the same sha1, I don't care what they are". I guess according to Wikipedia that's a "classical" collision.
Then there's "here's two different documents. Append to both of them until the results have the same hash". This is "chosen prefix" collision.
My question is, is there a name for an attack that says "here's a document. give me another document with the same hash."? (in which case, "the original" would have meaning).
> How would you know that the attacker have not manipulated the size parameter?
This scenario doesn't make a lot of sense. Say I have a goodfile that is 512 bytes long and hashes to 3d8850e1, and someone else wants to produce badfile and convince you that it's my goodfile. GP's suggestion is that I publish a size-plus-hash value "512-3d8850e1" for you to check against. If the attacker is in a position to alter the size part, they're also in a position to alter the hash part, in which case why even bother with a collision? They can just change the hash to be whatever badfile hashes to.
The true answer to GP is that if you do this, it's no longer a hash function. A hash function is defined as taking an arbitrary input and returning an n-bit output for some fixed value of n. By including the size of the input in your output, the size of your output grows logarithmically with the size of your input. This may seem pedantic, but fixed-size-output and arbitrary-size-input is extremely important for general usage of a hash function.
So perhaps, the easiest way to defend against a collision attack is to transmit the size of the data alongside the checksum. Like a checksum, it is extremely lightweight and easy to check.
In my data framework project (where I use sha1 to identify blocks), I've been looking for an excuse to do this, because knowing size of a block is incredibly useful for applying performance heuristics. I suspect it is also a significant defense against collision attacks. Am I wrong?