SHA1 collisions make Git vulnerable to attacks

ifdefdebug · on Feb 27, 2017

From the same thread (Peter Gutmann, Fri Feb 24 00:42:36 EST 2017):

"After sitting through an endless flood of headless-chicken messages on multiple media about SHA-1 being fatally broken, I thought I'd do a quick writeup about what this actually means. In short:

Reports of SHA-1's demise are considerably exaggerated.

What CWI/Google have done is confirmed what we've known for a long time, that SHA-1 is shaky. Using a nation-state's worth of resources and a year of time (https://security.googleblog.com/2017/02/announcing-first-sha...), they've shown that, with a very carefully-crafted document, you can create a collision. Their presentation of the results is detailed and accurate, it's the panicked misinterpretation of those results that are the problem."

Continues here: http://www.metzdowd.com/pipermail/cryptography/2017-February...

[edit: typo]

simias · on Feb 27, 2017

110,000$ is not "a nation-state's worth of resources". I agree with the rest though, the sky is not falling but people shouldn't react to baseless alarmist claims with baseless overconfident claims.

semi-extrinsic · on Feb 27, 2017

The implied meaning might have been "a significant post on a nation-state's cyber attack budget"? I'm pretty sure they did not mean "the total budget of a nation-state" or anything of the sorts, since that's obviously wrong.

One has to agree, an entity willing to drop a cool $100k on finding a single SHA1 collision to try and attack your git repo is a lot closer to nation-state level than the for-the-lulz level.

mikeash · on Feb 27, 2017

The whole point of using the "nation-state" term is to discuss something so difficult that even large companies or organized crime couldn't do it.

There are plenty of non-state organizations for whom $110,000 is barely even pocket change.

chrisseaton · on Feb 27, 2017

Why do people always say 'nation-state' specifically in these cases, as well? Some of the richest states in the world aren't nation-states, like the UK.

keevie · on Feb 27, 2017

I imagine because what they actually mean (state) gets ambiguous and confusing because of the united states, which are not really states in the same sense.

anf · on Feb 27, 2017

Can you clarify? I thought "nation state" was a fancy way of saying "country". Does it have a more specific meaning?

Edit: wikipedia to the rescue! https://en.wikipedia.org/wiki/Nation_state#United_Kingdom

petertodd · on Feb 27, 2017

$100k isn't that much money, particularly since collisions can be reused for multiple attacks w/ length-extension. Heck Bitcoin has had (ineffective) spam attacks that have probably have cost around that much, and there's good reason to suspect they've been privately funded by angry trolls.

There's a lot of people for whom $100k is "fuck you" money.

phpnode · on Feb 27, 2017

"fuck you money" is something different - it's the amount of wealth you need (varying per individual) where you can comfortably say "fuck you" to a particular job or opportunity or proposal someone makes to you if you don't want to do it. I believe the term you're looking for is something like "chump change"

ominous · on Feb 27, 2017

I have seen it used in that (to mean the same as chump change) in linkedin articles by random recruiters, so I guess it will suffer the fate of literally vs. figuratively. Terrible.. but use dictates meaning, if it goes mainstream.

nommm-nommm · on Feb 27, 2017

Can you give an example? "Fuck you money" is pretty literal already, - the money required to be able to say "fuck you." I can't see how it can make any sense in any other context.

ominous · on Feb 28, 2017

Can't find it. The use was as if it was "fuck it money", "it" being the fact you have enough so you're not counting expenses.

petertodd · on Feb 27, 2017

I'm aware of that usage of the term; I've also seen it used the way I'm using it.

English is fun that way. :)

dmd · on Feb 27, 2017

That's not what "fuck you" money means. https://www.quora.com/What-is-fuck-you-money

grovegames · on Feb 27, 2017

To put that in perspective, that's roughly the loaded rate of a salaried ~80k employee; roughly. So we're talking a single hire in a nice city.

smcl · on Feb 27, 2017

For a single PDF document, once

syncsynchalt · on Feb 27, 2017

"For a single malicious C file in the linux kernel, once"

(My understanding of the method is it might be extendable to modifying a comment mid-file and then introducing later code, instead of modifying a JPG inside a PDF)

brockers · on Feb 27, 2017

It cannot. The Google implementation must effectively be done on a blob as the result would not be usable in a structure specific document. What is more, things that require a block chain (like git) are NOT covered with this current attack as both the source and resulting have to be worked on.

Currently the attack vector only works when you can get both documents to "work towards each other" to produce a valid identical SHA1 value.

tialaramex · on Feb 27, 2017

This was a fixed prefix collision attack. That means they can make two documents (P | A | anything) and (P | B | anything) for a fixed P, and they can find A, B, such that A and B are different but

SHA1(P | A | anything) = SHA1(P | B anything)

The Merkle-Damgård construction (used in MD4, MD5, SHA1 and SHA2 but not in SHA3 and some other modern hashes) invariably means length extension is possible, if you can collide two documents then you can add a suffix to both and also get a collision.

This is how there's already a web site where you feed it images and it makes a "different" colliding PDF, it's just using Google's result with a different suffix after the 128-byte collision near the start.

qdog · on Feb 27, 2017

I think the initial r&d to get to this point is more along the lines of a nation-state investment. Google paid much more than $110k to get this working. It's not clear exactly how much it would cost to "weaponize", either.

nthcolumn · on Feb 27, 2017

That was just SHA1. Linus mentioned the other day that there was another layer and that they weren't worried. It would take considerably more resources to crack that again. But it is rather jolly to speculate about such things and other users of SHA1 (Windows?) might not nearly be so immune?

dv_dt · on Feb 27, 2017

Is that the cost for just the compute resources assuming time from people with expertise is free? Or setting up the resources to have a stable of people with the right background... Once you have that, then yes maybe its 100k.

ifdefdebug · on Feb 27, 2017

It's worth noting that the figure of $110000 was not mentioned in the referenced message, so probably Peter Gutmann was thinking at a different scale when he wrote "a nation-state's worth of resources".

frik · on Feb 27, 2017

Exactly. 110k USD is like a penny for a top 5000 company and foreign state actors.

StreamBright · on Feb 27, 2017

Security is quite often about the amount of money you have to put in to get something or somebody hacked.

110,000 USD is in the ballpark of state level players when we are talking about forging documents to avoid any sort of tampering detection. It has practically zero use of small time hackers or script kiddies. Why would anybody invest 110K into a collision? What is the practical use of it?

michaelt · on Feb 27, 2017

  Why would anybody invest 110K into a collision?

The thing people fear is (1) A collision that lets you have good code pass review, then have evil code released to users; (2) That happening to Linux/Android/Firefox/Chrome; (3) The cost of creating a remote code execution exploit being lower than the market value of that exploit on the black market.
I don't know how /realistic/ this fear is. Certainly, if everyone PGP signs all their commits, it's a much-reduced risk - but how many projects mandate that?

or some less scrutinised but widely deployed package

fulafel · on Feb 27, 2017

There are many low cost ways of doing "$100k worth of AWS" computation. Eg botnets, distributed volunteering, moonlight use of employers idle servers etc etc.

emn13 · on Feb 27, 2017

Also: that cost is certain to drop, and it might drop quite quickly - simply due to software and hardware improvements. If anything algorithmic shows up, it could change dramatically. Let's not wait for that to happen.

ethbro · on Feb 27, 2017

You'd ideally want to do this with a binary blob (firmware or graphics driver, because you know there's one sitting in git somewhere). Then, how is anyone going to know the difference?

mcherm · on Feb 27, 2017

> Why would anybody invest 110K into a collision? What is the practical use of it?

Suppose you are on the verge of completing a major sale to some large, nervous purchaser -- perhaps a major world military. This is a decent-sized but not huge sale: $2 billion, with profits of around $200 million. The other major competitor for this contract is built around Linux and your offering relies on a custom operating system.

Your head of sales thinks that the the purchasing agent seems particularly concerned about security issues with the operating system -- keeps asking questions like "So, can you document that your system is less vulnerable than some 'open source' system?". The head of sales makes a rough guess that a news story about vulnerabilities in Linux might sway the chance of winning the contract by around 5%.

So: that's $10 million in value to your company that might created by generating publicity about the vulnerability of Git so long as that publicity is generated at the right moment in time. What's the chance that 1% of that can be "found" to make it happen?

The thing is: $110,000 is actually a very SMALL amount of money, relative to the amounts of money that many influential people manage on a daily basis. The use doesn't have to be very practical for it to be well worth it.

Chyzwar · on Feb 27, 2017

Pathes in Linux are reviewed by multiple people before merging. Even if you create a collision and submit patch you cannot really do much without write access to repo. It is even more difficult because person merging path will not fast forward in most cases.

This attack still do not allow for inserting a arbitrary data in arbitrary places to make attack on Linux possible. Finally SHA1 in git also take size into consideration and make this attack even more expensive[2].

People should really chill out. There are cheaper attack vectors that collisions.

[2] https://public-inbox.org/git/CA+55aFxJGDpJXqpcoPnwvzcn_fB-za...

mcherm · on March 2, 2017

Notice that the attack I described does not require actually merging in the patch, it only requires that news stories be written about how there might be such a vulnerability.

quizotic · on Feb 27, 2017

Interesting hypothetical ... even more interesting if you're implying it might not be so hypothetical

mcherm · on March 2, 2017

I am NOT implying that it might not be hypothetical. I have absolutely no reason to believe that anything like this has been attempted. I'm just trying to point out that for many out there, $100K is chump change.

simias · on Feb 27, 2017

If there's no practical use for it then even state level players won't bother with it.

If there's ever a practical use for it (i.e. money to be made) 110$k is totally accessible to the private sector. It's definitely not "a nation-state's worth of resources" which is the quote I was replying to.

Fortunately there doesn't appear to be a whole lot of practical use for these collisions for the time being.

userbinator · on Feb 27, 2017

In other words, SHA-1 is still nowhere near as insecure as MD5, the latter for which collisions can be generated in seconds on hardware everyone already has.

aaron695 · on Feb 27, 2017

I didn't realise this was true -

https://www.bishopfox.com/resources/tools/other-free-tools/m...

adrianN · on Feb 27, 2017

If you manage to get a vulnerability into a widely used codebase using a sha1 collision, that could very well be worth more than $100k.

StreamBright · on Feb 27, 2017

They managed it without any collision several times:

https://arstechnica.com/security/2016/08/cisco-confirms-nsa-...

https://www.theregister.co.uk/2010/12/15/openbsd_backdoor_cl...

tgummerer · on Feb 27, 2017

I feel like this still ignores most of what Linus said on why git isn't broken. In particular "it's fairly trivial to detect the fingerprints of using this attack" in his Google+ post. https://plus.google.com/+LinusTorvalds/posts/7tp2gYWQugL

And there are already patches on the mailing list for that.

strictfp · on Feb 27, 2017

I think the fingerprint argument is pretty weak actually. There is still a lot of unreadable content in git repos, including binary blobs in the kernel.

tytso · on Feb 27, 2017

You don't understand the fingerprint argument. For the specific SHA-1 attack, it's possible to detect, while calculating the SHA-1 hash of an object, whether the bit pattern indicative of this specific attack is present. This is done automatically, without needing any human intervention. This is one of the things which Google released immediately as part of their announcement.

The other thing which people seem to miss is that it requires 6,500 years of GPU computation for the _first_ phase of the SHA1 attack, and 110 years of the GPU compatation for the _second_ phase of the attack. You need to do both phases in order successfully carry out this attack. And even if you do, Google released code so that someone can easily tell if the object they were hashing was one created using this parituclar attach, which required 6,500 + 110 years of GPU computation.

But alas, it's a lot more fun to run around screaming that the sky is falling.....

strictfp · on Feb 27, 2017

Thanks, I was wrong when saying "fingerprinting". The fingerprinting technique is actually quite reassuring. I was thinking of that he says

"But if you use git for source control like in the kernel, the stuff you really care about is source code, which is very much a transparent medium. If somebody inserts random odd generated crud in the middle of your source code, you will absolutely notice. " , which I still think is a very weak argument.

It might or might not be true for any particular developer, and his argument does not refute the claim that the SHA1 integrity checks for that code is being rendered useless. I specifically recall that Linus previously described the hashed chain of commits as something which would prevent malicious insertion of code. And this has now, at least to some degree, been compromised.

He did provide some solid countermeasures and migration plans, but I think he could have been more acknowledging to all the people who predicted this attack. It would have been a good idea to prepare for changing hash function eventually.

xxs · on Feb 27, 2017

keep in mind you have to maintain/commit the initial blob and then later the malicious one (again and again, this is no pre-image attack - the initial blob has to have a well designed place with random jazz ready to be replaced)

You could just place a malicious one from the get go and no one would know (or they would know just as much -- blob do rely on virtually unconditional trust)

baby · on Feb 27, 2017

There is already danger in accepting unreadable content by itself.

strictfp · on Feb 27, 2017

True. But I thought that the point of the hashes was to ensure that something which you had already verified (through review or testing or whatever) could not be tampered with without the changes being brought to your attention. And this property does no longer hold.

baby · on Feb 27, 2017

Yeah, but in your case you would just get the binary, verify it and push it yourself.

If you're using some weird way of getting a binary that you have already verified, but that could somehow differ, and you're hopping that git will catch the difference, you're doing it wrong to begin with.

nebulous1 · on Feb 27, 2017

He doesn't mean manually recognising the fingerprint

Ajedi32 · on Feb 27, 2017

Correct. He's talking about the automated method used on shattered.io to detect files which use the attack. See: https://github.com/cr-marcstevens/sha1collisiondetection

They're basically building that into git so that if this specific collision attack is ever used, git will notice and throw a warning/error.

strictfp · on Feb 27, 2017

Thanks, I misread that. I meant that he says

"But if you use git for source control like in the kernel, the stuff you really care about is source code, which is very much a transparent medium. If somebody inserts random odd generated crud in the middle of your source code, you will absolutely notice. "

, which I think is a very weak argument.

moonshinefe · on Feb 27, 2017

Please PoC a Git exploit then?

goodplay · on Feb 27, 2017

Sure, just wire me 200k.

moonshinefe · on Feb 27, 2017

If you're working on such a massively important git repo with very poor security measures and trust levels at $200k break in status that are practical... yeah, maybe bigger problems.

strictfp · on Feb 27, 2017

I'g glad you're willing to discuss things so freely /s

RubyPinch · on Feb 27, 2017

Its possible to go in and replace the hash algorithm with something else, which none of these "git is going to ruin everything with not replacing SHA1 this instant!" people seem to bother with, to prove their points, instead of endless posturing.

http://stackoverflow.com/a/34599081 has actually gone about doing it, but it has been over a year since that, and as linus says, there has been multiple collision mitigations added as well, so tests should probably be re-done

CJefferson · on Feb 27, 2017

My git with different hash would be useless. It wouldn't be able to interact with github, bitbucket, or pull/push to anyone else's repositories. I may as well rename the package.

Fixing this is going to require breaking backwards compatability with every program that works with git -- it's going to be a huge undertaking, because early in git's design they didn't support multiple hash functions.

RubyPinch · on Feb 27, 2017

To clarify, I mean to replace the hash with something that would collide far more often, to simulate collisions, to see how git handles them

robin__j · on Feb 27, 2017

https://stackoverflow.com/a/34599081 In this SO answer does it by reducing the hash size from 160-bit to 4-bit.

CJefferson · on Feb 27, 2017

Ah, that is a much more sensible request! Sorry for my harsh reply.

hvidgaard · on Feb 27, 2017

You can, but they should reengineer Git to use any hash function, and not assume output length.

mattkrause · on Feb 27, 2017

I think the bigger problem is that tons of other stuff also assumes the output length.

hvidgaard · on Feb 27, 2017

That can be mitegated, but the sooner steps are made towards changing it, the easier it will be.

chmike · on Feb 27, 2017

The forged hash collision will always be a weak point. Things will get worse with cost reduction of processing power. It's a loosing battle.

The only way to detect without error two identical files is by comparing the files. This comparison can be speed up by comparing compressed version of the files.

The other functionality of hashes is to build a presumably unique file identifier. The byte sequence of the compressed file could serve as identifer.

So instead of using the file system as index with the sha1 name as file name, we would have to build a specific database organized as a set whose values (compressed files) would be the keys. A hash index could be used to speed up the search and equality test. Here a very fast hash would do the trick. Sha1 or a faster hash would be ok. The file system could then be used to organize the hash buckets as does git.

File comparision would of course first compare compressed and uncompressed file size. Or use other hashes or longer hash values to detect different files. When all these values are identical, then a file comparison must be performed to detect if we have a collision.

File compression can only get better and faster.

So basically git would only need to add hash collision detection and the capacity to support different objects with the same hash identifier.

vbezhenar · on Feb 27, 2017

With reasonably long cryptographically strong hashes it's not a weak point (or you could call almost all cryptography a weak point). Weak point is hard-coded algorithms and sizes into Git (if I understood the problem correctly). Software should be written with more generic approach, so algorithms could be changed and migrated when necessary. SHA-1 is considered weak for many years, git should've migrated from it already.

hvidgaard · on Feb 27, 2017

> With reasonably long cryptographically strong hashes it's not a weak point (or you could call almost all cryptography a weak point).

We do not yet know if one way functions truly exists, so from a theoretical standpoint, any hash function is a weakpoint if you do not properly handle malicious collisions.

> SHA-1 is considered weak for many years, git should've migrated from it already.

Linus addressed this many years ago, when de was working on the first version of Git. It was chosen, despite the fact that it was known to be weakened. I don't know if they lost sight of this, or the geniuenly still believe that malicious colliding hashes are not a problem. I do not know enough about the intimate details of Git to comment on that fact.

DougBTX · on Feb 27, 2017

Git could have been written with a pluggable hash system, but for something that needs to be changed let's say once every ten years, is the up front cost really worth it?

adrianN · on Feb 27, 2017

How hard is it to introduce a global constant for the number of bits in your hash instead of writing char hash[40] everywhere?

pilif · on Feb 27, 2017

That's not good enough though if you need to work with repositories that were made with previous releases. Just converting them is one thing, but what if you need to stay compatible indefinitely.

And what about old releases that encounter a new repo?

And what about URLs and emails that reference commit hashes? Think archives of mailing lists that suddenly become useless unless there's a way to keep both hashes around.

Yes. These are all solvable problems (maybe not the old-release needing to handle new-style repo gracefully), but the complexity is much higher than upgrading a global constant.

adrianN · on Feb 27, 2017

I think you could solve most problems by just enabling a different hash function with no backwards compatibility. Repositories have a format-version somewhere, and migrating from git to git-with-new-hash should be a fairly simple operation. You can always edit commit messages to add "corresponds to commit <sha1> in <old repo>". This is of course not as nice as full backwards compatibility, but it gets rid of the security problem for relatively cheap.

emn13 · on Feb 27, 2017

The problem isn't the number of bits (yet). The problem is the choice of hash. A truncated sha2 would still be fine for years to come (barring a sha2 break, which doesn't look imminent). 160bits may not be a huge margin, but it's still enough.

adrianN · on Feb 27, 2017

If you use a strong hash with say 256 bits, it's not a weak point. Random collisions are less likely than cosmic rays flipping bits in your programs and unless you don't believe in strong cryptography attackers can't do much better.

zamalek · on Feb 27, 2017

This argument still assumes that Git uses SHA1 for security. Linus points that out and John doesn't attempt to refute it, simply ignoring it. Linus should have used Murmur, CityHash or something - a SHA1 collision was going to happen eventually. By using a content identification hash function we could have avoided this argument entirely.

chopin · on Feb 27, 2017

Just a nitpick: Git predates Murmur and CityHash.

kzrdude · on Feb 27, 2017

That's just some mantra to shape the discussion in the way they like.

Git allows signing tags and commits, and those features are now broken because all object names use SHA1.

moonshinefe · on Feb 27, 2017

So wait, when is SHA-1 ever used for signing tags and commits in git? that's new to me..

My understanding was you're using a public key type of encryption such as PGP at that point. I feel I may be missing the point here. (apologies if so)

Perseids · on Feb 27, 2017

What you need to understand, is that Git's data structures are essentially annotated Merkle trees [1]. So whatever you sign, be it a tag or a commit, it will be nested sha1 hashes like [someData].sha1([someData].sha1([someData].[aFile]).[someData]).[someData] . And at every level you can conceivable construct a hash collision. So if e.g. you create your own commit on top of a commit of an attacker and sign your commit, you are only signing a (sha1 of a sha1 of a) sha1 of the commit of the attacker. If the attacker's commit was crafted to enable a sha1 collision somewhere, then your signed commit doesn't cover the files and commits you see, but only the sha1 hashes of those objects.

This kind of hairy distinction of what a signature was supposed to mean and what it actually covers is what you get with (semi-)broken cryptographic primitives. It's awful and, frankly, unnecessary.

[1] https://en.wikipedia.org/wiki/Merkle_tree

moonshinefe · on Feb 27, 2017

I appreciate that info, +1. My main point was just if you're dealing with a repository where actors that have those 6 figures to spare to attack you (and that's a minority):

1) you've got to rely on a lot better security than the minimal if at all security provided by git (it assumes a web of trust). If people are signing off with PGP sigs but not watching diffs, you've got big problems.

2) You're probably far more likely to be exploited by far cheaper methods at this point. If they have access to a trusted contributor's keys, it's far more cost effective to slip in other tricks than sha-1 collisions right now. I'd say this is the main point so far, but admittedly maybe not in the future.

3) It sounds like Linus and the git devs have admitted they need to migrate from sha-1, but also I haven't seen any cheap, exploitable PoC for git yet based on this due to how they actually mix in other info instead of raw sha-1 hashes of the files.

4) As far as I know, and I'm sure I'm subject to correction, but there hasn't been a WebKit svn repo-esque calamity yet like what they've experienced dropping 2 sha-1 collision PDFs into the repo in a Git context yet.

Again, I'm totally open to new info, but the sky-is-falling attitude right now is what I'm mainly arguing against.

5ilv3r · on Feb 27, 2017

If you drop two identical files into a linux repo then they will be rejected by the maintainers. You don't even need to get into a technical solution to prevent it.

daurnimator · on Feb 27, 2017

I sign all my commits. Github even make it easy to see: https://github.com/blog/2144-gpg-signature-verification And I also use it in a few places e.g. for http://hashbang.sh/, we only allow signed commits into master

However, as far as I know, when you sign a git commit you are actually sign the hash of the commit. With SHA-1 broken in the current way it essentially means someone with 110K to burn could forge a commit and reuse my PGP signature.

moonshinefe · on Feb 27, 2017

Yes, if you sign off on commits you haven't reviewed to confirm the diffs at all, and they're carefully crafted files to make the collision, you may commit a duplicate sha-1, which still doesn't even break Git.

kzrdude · on Feb 27, 2017

This is an important distinction. Without a preimage attack on sha-1, the only vulnerability is if some part (yet any part) of the git objects reachable from the signed tag or commit is one half of a prepared collision pair.

moonshinefe · on Feb 27, 2017

Well holy smokes. I don't know which repository you contribute to, but if you're getting undermined by such James Bond-esque deception by super villains, in addition to someone spending 6 figures into breaking your stuff, I'd hope you'd at least review the commits you sign with your key after glancing at it.

In addition, you'd have to have everyone else not notice it, all the insanely cheaper exploits not been tried on your current setup, and all the other stars aligning...

moonshinefe · on Feb 27, 2017

I'll also mention literally the first setup step to Git for most people: https://help.github.com/articles/set-up-git/ or similar.

That might be a hint that Git isn't something which you should allow it to handle the security. Literally the first step to the entire thing: pick any email or name...

Please provide reasonable security policies in your repos--and if someone is exploited, you've probably got far bigger problems than someone duplicating a sha-1. Not necessarily, but highly likely your system is owned.

zamalek · on Feb 27, 2017

My original statement is slightly incorrect; you can. The commit hash is probably used as part of the signature. It would have been better to sign the commit blob directly, as Git stores the length alongside the hash from that point down in the DAG (making prefix attacks impossible).

jcoffland · on Feb 27, 2017

A SHA1 collision has happened and it has broken at least one (SVN) repo already.

https://arstechnica.com/security/2017/02/watershed-sha1-coll...

jfoutz · on Feb 27, 2017

I could be crazy, but aren't the hashes the diffs of checkins?

It's certainly possible to create a valid patch file that causes a collision, but it seems really hard to make a collision that looks like a valid pull request.

I understand your concern (I think) but consider all the extra stuff that has to happen for someone to accept a pr.

Edit.

I do agree it's time to start thinking about moving to the fire exits

RichardCA · on Feb 27, 2017

No, that's not how Git works. There are multiple objects in Git that use SHA-1 for identification. A common point of confusion is when someone thinks a commit is essentially a diff. It's not, each commit is a snapshot that can be used to reconstruct the entire work tree. You get the diff when you compare the commit to its parent.

Here, this is a good read: https://github.com/git/git/commit/e83c5163316f89bfbde7d9ab23...

kpcyrd · on Feb 27, 2017

No, the commit hash is a hash of the commit object, which holds a hash of the tree object, which holds hashes of the file objects.

grandalf · on Feb 27, 2017

As long as the first part of the diff shown appeared not to be abnormal, some users might be fooled.

For example, a file could be created that appears to be a normal source file at the front but contains some other behavior further down.

zamalek · on Feb 27, 2017

SHA1 is not used for security in Git, the session layer (SSH, HTTPS) is. Edit: The broken repo was caused by SVN and not Git.

kpcyrd · on Feb 27, 2017

> SHA1 is not used for security in Git

       -S[<keyid>], --gpg-sign[=<keyid>]
           GPG-sign commits. The keyid argument is optional and defaults to the committer identity; if specified, it must be stuck to the option without a space.

Yes it is.

baby · on Feb 27, 2017

Yup. But Signatures rely on the hash pre-image resistance. Not on the collision resistance of the hash.

junke · on Feb 27, 2017

I don't know much cryptography. Wouldn't an attack require you to forge a commit object which is a good-looking patch, along with a valid signature (signed from someone you trust), which has the same identity (SHA1 hash)?

cyphar · on Feb 27, 2017

The attack is not as difficult as that. If you can create a valid git object which collides with another git object, signatures for the previous object tree (which is identified by SHA-1 hash) will be valid for the new object tree (which has the same hash).

So a collision in a blob that represents a file (or any other internal git object) will cause in your old signature still being valid for the new file that corresponds to the git collision.

nicky0 · on Feb 27, 2017

No need to forge the signature. The signature will still verify since your forged object has the same hash as the genuine one.

zamalek · on Feb 27, 2017

Fair enough, clearly I was mistaken.

Pharylon · on Feb 27, 2017

Linus in 2005:

"But the _real_ security comes from the fact that git is distributed, which means that a developer should never actually use a public tree for his development."

And then GitHub happened.

ericfrederich · on Feb 27, 2017

Look at these SHA-1 sums...

https://github.com/ericfrederich/sillygit/commits/master

cespare · on Feb 27, 2017

28 bits is easy. 160 bits is not easy.

tokenizerrr · on Feb 27, 2017

That's just brute forcing the first few characters.

luckystarr · on Feb 27, 2017

Well, he was saved by SHA-1 being still cryptographic enough to rely on the head-sha of his tree to know nobody changed anything after somebody broke into kernel.org. Not sure how his fetch/push policy is, but my guess is this would have been more of a headache if it would have been MD5.

baby · on Feb 27, 2017

yup, this was relying on SHA-1 pre-image resistance. Imagine that even MD5 still has pre-image resistance.

StreamBright · on Feb 27, 2017

I am not sure how serious this is compare to:

https://github.com/amoffat/masquerade/commit/9b0562595cc479a...

wyldfire · on Feb 27, 2017

That's not a problem with git. If anything it's an issue with github, but it's a pretty insignificant one IMO.

Yes, if you say you are billg@microsoft.com and make a commit to some repo on github, github will look up the username associated with billg@microsoft.com and show that user as the committer. Should it do that? Eh, probably not but this has come up a few times and github hasn't changed it. So by now we should just start to educate ourselves that this is how github is intended to work.

moonshinefe · on Feb 27, 2017

This reads like a giant "I told you so." circa 2005

"In the next few years, nasty people will teach him the threat model"

I'd like to see those very forceful claims substantiated. Git hasn't said moving forward it will never change from sha-1 and 2005 was a far different era than 2017 for crypto. Let's keep that in perspective.

baby · on Feb 27, 2017

1) Yes, Git should move to a better hash function at some point.

2) No, even easy "malicious" collisions in SHA-1 will still not break most of Git's usages. You're already trusting the repo you're pulling because of TLS, you're already trusting the commits you're getting because of peer-review (you read the commit) and a web-of-trust (you trust your collaborators). (And you're trusting commits even more when they're signed.)

chmike · on Feb 27, 2017

SHA1 is still OK for identifying files. The probability of random collision is still very low. The only problem is for forged collisions.

The object store could be modified to support file collision. One way to disambiguate collision is to use a randomly generated byte sequence as SHA1 seed or hashed before the file data. This random byte sequence would behave like a salt and disable any forged collisions. A single seed for the whole repository would be enough. It should remain secret to prevent forging a collision with the two hashes. It's harder but not impossible.

To test if a given file is in the object store, one first compute the SHA1 key to use as file name. If no collisions ever occured with an object a file with that SHA1 name will be present in the store. That file contains the usual data plus the second hash computed with the random seed. This second hash could be added as needed to keep backward compatibility and provide silent automatic upgrade.

When one need to test if the file is present in the store, one computes the normal SHA1 key and the secondary hash with the seed. We locate the object in the store uisng the first SHA and test for file equality with the randomly seeded hash. Using a faster hash like blacke2 to compute the random seeded hash could mitigate the price to compute two hashes. It should be parameterized this time and the hash size should be variable.

If a collision is detected, that is the secondary hash differ, the file is replaced by a directory with the common SHA1 as name. The colliding files would be stored in the directory using the secondary hash as name. Or the files could be packed in a single tar like file with the secondary hash used as file identifier.

This should be enough to protect against forged collisions which is the only real problem. The required change to git would be limited. The only serious disadvantage is the need to compute the randomly seeded hash.

ioquatix · on Feb 27, 2017

When I designed fingerprint (https://github.com/ioquatix/fingerprint) I allowed multiple checksums, which means you CAN migrate from one checksum to another pretty easily. However, I didn't use a Merkle tree in the initial design and so I hope to revisit it at some point to improve it.

It should be trivial to add an additional checksum to git. Not to replace how SHA1 is currently used, but to add essentially per-commit checksum, which is a checksum of the entire commit contents (including the checksum of the previous commit). It wouldn't be as elegant as using SHA256 in place of SHA1, but at least you could, with some effort, validate the source tree in a cryptographically secure way.

wwalexander · on Feb 27, 2017

Hacker News doesn't support Markdown comments like Reddit. There are a couple Markdown-like features like italics, but that's it.

wfunction · on Feb 27, 2017

Why does Linux think that the source file in question has to acquire an incomprehensible blob in the middle in order for a commit hash collision to occur? Can't the attacker just make all the changes he wants and then insert a random 1 KiB file somewhere to compensate for the commit hash? It could be totally tucked away somewhere you don't expect... you wouldn't see it just by looking at source code.

mattkrause · on Feb 27, 2017

The demonstration is a pair of PDFs that display differently. However, Google didn't find two PDFs that just happen to collide. Instead, they built one PDF that contains two jpegs and a switch that selects one of them for display. The neat bit is that the file has the same hash regardless of the switch's setting.

This attack won't work on plain text source because it won't look like source code.

Aldo_MX · on Feb 27, 2017

I'm not an expert, but for me an incomprehensible blob that is pretty likely to be overlooked begins with:

  This program is free software; you can redistribute it
  and/or modify it under the terms of...

alkonaut · on Feb 27, 2017

Haha true, add gibberish below the fold of License.md and no one will ever read it.

commit -m "Added dutch license translation"

crystalmeph · on Feb 27, 2017

Forgive my ignorance. I don't understand the threat model being worried about here. If someone breaks into your CVS/SVN server and rewrites project history, how is that less bad than someone breaking into your remote git repo and rewriting project history? Don't both attacks require a break-in at the server/remote? Or does CVS/SVN have a better method of detecting such a break-in?

bandrami · on Feb 27, 2017

I don't get why the discovery of a non-preimage attack is causing so much consternation.

If the mere existence of collisions is not acceptable to your VCS, then your VCS can't use a hash, period.

If you're worried about an intentional attack, it's no closer today than it was last week: the attacker doesn't control the output hash of the collision, or either input.

Taek · on Feb 27, 2017

> If the mere existence of collisions is not acceptable to your VCS, then your VCS can't use a hash, period

What? It could use a secure hash. Collisions take more energy than the universe has to discover.

bandrami · on Feb 27, 2017

"Secure" is a time-constrained term. But any hash, whatsoever, has collisions, so if collisions are unacceptable, your VCS can't use them.

(The point being that a VCS should handle collisions gracefully no matter what has is used.)

lmm · on Feb 27, 2017

No VCS is 100% secure against the possibility of catastrophic failure. If an asteroid wiped out all life on earth there is no VCS that can handle that gracefully. So as long as a hash collision is less likely than that, using a hash and not handling collisions gracefully doesn't make the VCS substantially less safe.

junke · on Feb 27, 2017

Found another detailed explanation from Mike Gerwitz:

https://forums.whonix.org/t/security-git-general-verificatio...

9gunpi · on Feb 27, 2017

Git should have been written with pluggable hashes, right. Also, SHA1 could have been invented stronger against collisions in the first place.

In the retrospect many decisions which might have simplified current moment's problems are obvious, but in reality of those decisions in the past they never are. This feeling (it's called the retrospective predictability) is not a function of current problem or previous wrong decisions, but the random reality of events in complex systems, open source software implementing fresh ideas in a new way is being random and complex enough for this.

xiphias · on Feb 27, 2017

I just went to the github blog and I see this: ,,GitHub GDC Party 2017''. While security of git is broken, they are partying.

Gitlab blog: working on UX

Bitbucket: branch permissions

All of the teams should handle this issue as an emergency, and at least have a blog entry about their points of view.

nicky0 · on Feb 27, 2017

And I see you commenting on Hacker News instead of getting important work done.

ageofwant · on Feb 27, 2017

That may be appropriate if it was an "emergency". It's not.

xiphias · on Feb 27, 2017

Luckily there's a long thread on bitcoin-dev mailing list about how disastorous the consequences can be if somebody is able to change the git-tree because of this attack scenario, so I'm not the only one who thinks that signing only the commit can make billions of dollars worth of damage. And this doesn't take into account that almost all companies use open source software developed on github, so I believe that any remote possibility of adding a malware to open source software is emergency situation.