> There is also the risk (which cannot really be made to go away) that the longe...

rainsford · on Dec 31, 2022

>> that is only the first step in the development of a successful attack. Finding a collision of any type is hard; finding one that is still working code, that has the functionality the attacker is after, and that looks reasonable to both humans and compilers is quite a bit harder — if it is possible at all.

> I mean, if you have any sort of binary files in your repo, that's pretty doubtful.

> The way you mostly do this, is the colliding part is a short binary blob which is embedded, and then the file has code outside the colliding part that does different things depending on the value of the blob.

> Yeah, getting that past human review with a source code file is going to be tricky. Otoh if you have any sort of binary assets in your git, (this might even include images depending on the attack goals, e.g. goatse attack) this seems a lot more plausible

My view is that if you find yourself rationalizing away potential cryptographic issues with "I bet this will be hard to successfully attack in practice", you're probably better off just fixing the problem if you can. Once you've moved from just relying on cryptographic security to non-cryptographic factors like human code reviews or constrained input formats, you've made it significantly more complicated to evaluate the security of your system and significantly increased the risk that an attacker comes up with an approach you haven't considered.

It's very tempting to conclude that a cryptographic attack isn't really an issue for your system and you don't have to change anything, but that conclusion is almost certainly not based on a real understanding of the risk you're accepting. Just using SHA-256 or something similar is almost always a better answer than coming up with some more complicated reason to keep using SHA-1.

aljarry · on Dec 31, 2022

Interesting note - there's already standard representation for truncated sha2 - SHA-512/224 and SHA-512/256, but unfortunately no with output length of sha1. Even more interesting thing is that those truncated representations are more secure against length extension attacks.

newpavlov · on Dec 31, 2022

>just truncate sha-256

A proper solution would be to define SHA-160, similarly to how SHA-224 is defined, i.e. you would use a different initialization constant and truncate output of the core SHA-256 algorithm. But I guess, it would be a bit more difficult to implement than simply truncating output of an existing SHA-256 implementation.

loeg · on Dec 31, 2022

SHA-224 is just truncated SHA-256 (with different constants, but that does not matter here).

phkx · on Dec 31, 2022

Truncating the sha256 hashes does sound like a reasonable intermediate step and should also enable interoperability (a guess from my side - if it is only about referencing objects, it probably does not matter how the keys were generated). At some point one could then transition to the full hashes and make the truncated ones an option.

I‘m wondering what tooling is heavily dependent on the length of the hashes. Potentially if you want to keep the size of the transmitted data small (at work, we once considered git as a versioned database for an IoT use case…).

TedDoesntTalk · on Dec 31, 2022

You could SHA-1 hash the SHA-256 hash instead of truncating it :)

mlindner · on Dec 31, 2022

I mean that doesn't help anything as if you collide the SHA-1 hash you automatically get a SHA-256 hash collision.

awestroke · on Dec 31, 2022

That can't be true

mlindner · on Dec 31, 2022

Perhaps I misinterpreted your comment.

If you have X and Y such that X has a SHA1 hash collision with Y, you end up such that SHA256(SHA1(X)) == SHA256(SHA1(Y)).

That's why I said what I said.

nequo · on Dec 31, 2022

I think the ancestor comment meant SHA1(SHA256(X)) instead. Not clear to me how that wouldn’t have collisions, too. Just that the underlying commits that generate the collisions would need to look different.

mlindner · on Dec 31, 2022

If that's the case you can just replace the SHA256(X) portion with arbitrary content to get the above SHA1 collision.

gregmac · on Dec 31, 2022

I don't follow. If the algorithm is SHA1(SHA256(X)) all an attacker can modify is X. Yes it's possible to find a SHA1() collision, but finding X where the SHA256() will generate a collision -- that is SHA1(SHA256(X)) == SHA1(SHA256(Y)) -- is still required.

The question is does the SHA1 step make this any easier?

Don't you still have to either break SHA256 (predicting the hash it will generate) or do this by brute force?

mlindner · on Dec 31, 2022

I was assuming that it was optionally SHA1(X) or SHA1(SHA256(X)) with the determination of which happening being something attacker controllable in X.

TedDoesntTalk · on Dec 31, 2022

I’m OP. I mean SHA1(SHA256(X)) but I have no idea if that makes a collision more difficult than SHA1(X) or any other implications. It was a way to reduce to hash length without truncation.

ilyt · on Dec 31, 2022

I'd imagine it's easy error to make to just go and load sha1 length of characters from git, or splattering some validation in code going "okay this is not sha1-length hash, must be something wrong with data"

dataflow · on Dec 31, 2022

I don't know how the collision detector works, but in general, you don't even need binary files do you? Just add a comment in a source file at the end of some line with near-arbitrary data. Bonus points if it's preceded by enough whitespace to fool the reader into thinking there's nothing there.

bawolff · on Dec 31, 2022

I was just going on its a lot harder to trick a human in a text format. Most collisions involve a bunch of binary data that isn't valid utf-8, which looks very conspicious in a text file.

layer8 · on Dec 31, 2022

You can do a lot with various whitespace characters, see for example https://github.com/not-an-aardvark/lucky-commit.

bawolff · on Jan 1, 2023

Oh definitely, but doing that (bruteforcing using only whitespace) for all 160 bits of sha-1 is way beyond our capability.