Hacker News new | past | comments | ask | show | jobs | submit login

Can collision attacks provide the same size of data? I suspect it would be dramatically more difficult to produce a collision of equal data-size as the original.

So perhaps, the easiest way to defend against a collision attack is to transmit the size of the data alongside the checksum. Like a checksum, it is extremely lightweight and easy to check.

In my data framework project (where I use sha1 to identify blocks), I've been looking for an excuse to do this, because knowing size of a block is incredibly useful for applying performance heuristics. I suspect it is also a significant defense against collision attacks. Am I wrong?




I included this comment in a response to a child comment, but reproducing it here to make sure you see it:

If your output includes the size of the input, then what you've got is no longer a hash function. A hash function is defined as taking an arbitrary input and returning an n-bit output for some fixed value of n. By including the size of the input in your output, the size of your output grows logarithmically with the size of your input. This may seem pedantic, but fixed-size-output and arbitrary-size-input is extremely important for general usage of a hash function.


Sha1 and sha256 already hash the message length into the hash function and it is limited to 2^64 bits so they do not support arbitrary input size.


For what it's worth, Shattered specifically altered a small part of an existing file. Perhaps with chosen-prefix the story is different.

https://shattered.io (check out the two pdfs)


I forgot to specifically state - I think the two pdfs are the same size.


(This is chosen-prefix attack; there is no "the original".)

In all the attacks on MD5 or SHA-1 I'm aware of, the generated colliding strings had the same length.


So Shattered was "give me two different documents with the same sha1, I don't care what they are". I guess according to Wikipedia that's a "classical" collision.

Then there's "here's two different documents. Append to both of them until the results have the same hash". This is "chosen prefix" collision.

My question is, is there a name for an attack that says "here's a document. give me another document with the same hash."? (in which case, "the original" would have meaning).


It's called "second preimage attack".


Or you could just switch hash algorithms to something stronger.


Yes, if I interpret your suggestion correctly. How would you know that the attacker have not manipulated the size parameter?

That's the best case. Worst case you end up with a memory vulernability (see heartbleed https://xkcd.com/1354/)


> How would you know that the attacker have not manipulated the size parameter?

This scenario doesn't make a lot of sense. Say I have a goodfile that is 512 bytes long and hashes to 3d8850e1, and someone else wants to produce badfile and convince you that it's my goodfile. GP's suggestion is that I publish a size-plus-hash value "512-3d8850e1" for you to check against. If the attacker is in a position to alter the size part, they're also in a position to alter the hash part, in which case why even bother with a collision? They can just change the hash to be whatever badfile hashes to.

The true answer to GP is that if you do this, it's no longer a hash function. A hash function is defined as taking an arbitrary input and returning an n-bit output for some fixed value of n. By including the size of the input in your output, the size of your output grows logarithmically with the size of your input. This may seem pedantic, but fixed-size-output and arbitrary-size-input is extremely important for general usage of a hash function.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: