Whatever happened to SHA-256 support in Git

tkone · on Dec 31, 2022

Simple answer: github was doing most of this work as SHA1 is a non-allowed hash type for FIPS compliance, which mattered since Microsoft had landed the US DoD JEDI contract.

The JEDI contract was cancelled a in 2021 so the work never continued on that workstream.

source: former github developer

FreakLegion · on Dec 31, 2022

Clarification: SHA-1 is under review but still allowed. The next revision of FIPS 180-4 will certainly start the clock on retiring it, but that's a years-long process.

layer8 · on Dec 31, 2022

More specifically, the current deadline is end of 2030: “Modules that still use SHA-1 after 2030 will not be permitted for purchase by the federal government.” (https://www.nist.gov/news-events/news/2022/12/nist-retires-s...)

bawolff · on Dec 31, 2022

> There is also the risk (which cannot really be made to go away) that the longer hashes used with SHA-256 may break tools developed outside of the Git project

Easy fix if that is really an issue, just truncate sha-256. The length of the hash is not the issue that needs fixing (even if its a nice side benefit).

> that is only the first step in the development of a successful attack. Finding a collision of any type is hard; finding one that is still working code, that has the functionality the attacker is after, and that looks reasonable to both humans and compilers is quite a bit harder — if it is possible at all.

I mean, if you have any sort of binary files in your repo, that's pretty doubtful.

The way you mostly do this, is the colliding part is a short binary blob which is embedded, and then the file has code outside the colliding part that does different things depending on the value of the blob.

Yeah, getting that past human review with a source code file is going to be tricky. Otoh if you have any sort of binary assets in your git, (this might even include images depending on the attack goals, e.g. goatse attack) this seems a lot more plausible

P.s. to be clear, i agree that sha-1dc variant removes most of the urgency.

rainsford · on Dec 31, 2022

>> that is only the first step in the development of a successful attack. Finding a collision of any type is hard; finding one that is still working code, that has the functionality the attacker is after, and that looks reasonable to both humans and compilers is quite a bit harder — if it is possible at all.

> I mean, if you have any sort of binary files in your repo, that's pretty doubtful.

> The way you mostly do this, is the colliding part is a short binary blob which is embedded, and then the file has code outside the colliding part that does different things depending on the value of the blob.

> Yeah, getting that past human review with a source code file is going to be tricky. Otoh if you have any sort of binary assets in your git, (this might even include images depending on the attack goals, e.g. goatse attack) this seems a lot more plausible

My view is that if you find yourself rationalizing away potential cryptographic issues with "I bet this will be hard to successfully attack in practice", you're probably better off just fixing the problem if you can. Once you've moved from just relying on cryptographic security to non-cryptographic factors like human code reviews or constrained input formats, you've made it significantly more complicated to evaluate the security of your system and significantly increased the risk that an attacker comes up with an approach you haven't considered.

It's very tempting to conclude that a cryptographic attack isn't really an issue for your system and you don't have to change anything, but that conclusion is almost certainly not based on a real understanding of the risk you're accepting. Just using SHA-256 or something similar is almost always a better answer than coming up with some more complicated reason to keep using SHA-1.

aljarry · on Dec 31, 2022

Interesting note - there's already standard representation for truncated sha2 - SHA-512/224 and SHA-512/256, but unfortunately no with output length of sha1. Even more interesting thing is that those truncated representations are more secure against length extension attacks.

newpavlov · on Dec 31, 2022

>just truncate sha-256

A proper solution would be to define SHA-160, similarly to how SHA-224 is defined, i.e. you would use a different initialization constant and truncate output of the core SHA-256 algorithm. But I guess, it would be a bit more difficult to implement than simply truncating output of an existing SHA-256 implementation.

loeg · on Dec 31, 2022

SHA-224 is just truncated SHA-256 (with different constants, but that does not matter here).

phkx · on Dec 31, 2022

Truncating the sha256 hashes does sound like a reasonable intermediate step and should also enable interoperability (a guess from my side - if it is only about referencing objects, it probably does not matter how the keys were generated). At some point one could then transition to the full hashes and make the truncated ones an option.

I‘m wondering what tooling is heavily dependent on the length of the hashes. Potentially if you want to keep the size of the transmitted data small (at work, we once considered git as a versioned database for an IoT use case…).

TedDoesntTalk · on Dec 31, 2022

You could SHA-1 hash the SHA-256 hash instead of truncating it :)

mlindner · on Dec 31, 2022

I mean that doesn't help anything as if you collide the SHA-1 hash you automatically get a SHA-256 hash collision.

awestroke · on Dec 31, 2022

That can't be true

mlindner · on Dec 31, 2022

Perhaps I misinterpreted your comment.

If you have X and Y such that X has a SHA1 hash collision with Y, you end up such that SHA256(SHA1(X)) == SHA256(SHA1(Y)).

That's why I said what I said.

nequo · on Dec 31, 2022

I think the ancestor comment meant SHA1(SHA256(X)) instead. Not clear to me how that wouldn’t have collisions, too. Just that the underlying commits that generate the collisions would need to look different.

mlindner · on Dec 31, 2022

If that's the case you can just replace the SHA256(X) portion with arbitrary content to get the above SHA1 collision.

gregmac · on Dec 31, 2022

I don't follow. If the algorithm is SHA1(SHA256(X)) all an attacker can modify is X. Yes it's possible to find a SHA1() collision, but finding X where the SHA256() will generate a collision -- that is SHA1(SHA256(X)) == SHA1(SHA256(Y)) -- is still required.

The question is does the SHA1 step make this any easier?

Don't you still have to either break SHA256 (predicting the hash it will generate) or do this by brute force?

mlindner · on Dec 31, 2022

I was assuming that it was optionally SHA1(X) or SHA1(SHA256(X)) with the determination of which happening being something attacker controllable in X.

TedDoesntTalk · on Dec 31, 2022

I’m OP. I mean SHA1(SHA256(X)) but I have no idea if that makes a collision more difficult than SHA1(X) or any other implications. It was a way to reduce to hash length without truncation.

ilyt · on Dec 31, 2022

I'd imagine it's easy error to make to just go and load sha1 length of characters from git, or splattering some validation in code going "okay this is not sha1-length hash, must be something wrong with data"

dataflow · on Dec 31, 2022

I don't know how the collision detector works, but in general, you don't even need binary files do you? Just add a comment in a source file at the end of some line with near-arbitrary data. Bonus points if it's preceded by enough whitespace to fool the reader into thinking there's nothing there.

bawolff · on Dec 31, 2022

I was just going on its a lot harder to trick a human in a text format. Most collisions involve a bunch of binary data that isn't valid utf-8, which looks very conspicious in a text file.

layer8 · on Dec 31, 2022

You can do a lot with various whitespace characters, see for example https://github.com/not-an-aardvark/lucky-commit.

bawolff · on Jan 1, 2023

Oh definitely, but doing that (bruteforcing using only whitespace) for all 160 bits of sha-1 is way beyond our capability.

leipert · on Dec 31, 2022

Previously discussed with hundreds of comments: https://news.ycombinator.com/item?id=31851755

mlindner · on Dec 31, 2022

It's not an issue to use SHA-1 for git. Heck they could have used md5 and it still would have achieved the same. The use of SHA-1 in Git is not for security purposes, it's for accidental data corruption and uniqueness purposes as you're guaranteed to never accidentally get a collision.

Strilanc · on Dec 31, 2022

> The use of SHA-1 in Git is not for security purposes

It's for verifying data integrity, which is a cryptographic task (ie for security purposes). In Linus' Google tech talk on git he actually does say it's not a security thing, but then gives an example of verifying the data want tampered with by a third party [1].

Whether or not the feature was conceived of as a security feature, it is de-facto being relied on as a security feature. There are malicious actors on the internet that try to inject malware into software repositories. The fact that they can't silently change the history makes this task harder. If a non cryptographic hash function like crc32 was used, it would be child's play to cause shenanigans with collisions.

As an anecdote, I once reverse engineered the checksum used by warcraft 3 to verify that players had the custom map being played. It was not cryptographically secure. Just xoring values into a rotating accumulator [2]. Not hard to collide. Within a few months, there were versions of maps with built-in vision hacks and collided checksums being passed around. It was enough of a problem that the next game patch added sha1 as a checksum. If git had started with a non cryptographic hash function, it would have been forced to switch to one for similar reasons.

1: https://youtu.be/4XpnKHJAok8?t=3672

2: https://github.com/Strilanc/Tinker/blob/755ecbf6e06996166490...

n2d4 · on Dec 31, 2022

> The use of SHA-1 in Git is not for security purposes, it's for accidental data corruption and uniqueness purposes

In Git itself maybe, but many tools and developers rely on or automatically assume that commit hashes are collision resistant. For example from the GitHub Actions docs:

> Pinning an action to a full length commit SHA is currently the only way to use an action as an immutable release. Pinning to a particular SHA helps mitigate the risk of a bad actor adding a backdoor to the action's repository, as they would need to generate a SHA-1 collision for a valid Git object payload.

https://docs.github.com/en/actions/security-guides/security-...

mlindner · on Dec 31, 2022

This would be a security vulnerability at GitHub, not one of Git.

n2d4 · on Dec 31, 2022

Git is a tool and a tool should be useful. This tool would be much more useful, and also more intuitive to use, if it actually provided cryptographic hashes of its commits and didn't just pretend to.

You can shift some of the blame to the user if something goes terribly wrong, but at least partly it's also the tool's fault. Git's security is a footgun that is hardly productive.

Believe me, if Git were fundamentally broken, the issue would've been fixed already. It's in that "uncanny valley" of security issues, where it's bad enough to cause damage, but not bad enough to get people to stop what they're doing and fix it.

rob74 · on Dec 31, 2022

Well, as the article says, Git already fixed the issue, SHA-256 is already supported. Now it's up to GitHub, GitLab and the innumerable other platform/tool providers to update their solutions, until then Git can't in good conscience make SHA-256 the default, because that would condemn a repository to a life in eternal isolation...

n2d4 · on Dec 31, 2022

Per the article, the biggest problem isn't third party integration but that there is no interop with old repos. The feature is in an experimental state right now (and labeled as such), so you can't and shouldn't expect GitHub/GitLab/anyone to start using it in production

tinus_hn · on Dec 31, 2022

Indeed if you specify all users are trusted there are no security vulnerabilities. Unfortunately in the real world not all users are trustworthy.

rnkn · on Dec 31, 2022

> many tools and developers rely on or automatically assume that commit hashes are collision resistant.

Solution seems to be don't. Use the tool as the tool was intended.

roenxi · on Dec 31, 2022

That is a good plan on first consideration, but on close inspection appears to require that the tool author was omniscient and anticipated every possible use of their tool.

Traditionally a lot of the usefulness from tools comes from people doing things that were not intended. The modern web springs to mind, it was a terrible hack in the grand old IE days.

It is better for tools to have obvious failure modes.

Dylan16807 · on Dec 31, 2022

> Use the tool as the tool was intended.

Make tools better and safer when there are good opportunities.

still_grokking · on Dec 31, 2022

This assumes that people actually know what they're doing when writing code.

But this assumption was proven wrong infinite many times already.

Shouting RTFM didn't help, even after decades of doing so.

Actually it's getting worse.

Copy'n'paste form Stackoverflow without understanding anything was likely only a warmup. Now we're going to get AI generated code.

eru · on Dec 31, 2022

> The use of SHA-1 in Git is not for security purposes, [...]

Only, it is. When you eg sign a commit in git, you only sign the hash. So someone else could pretend you signed a different commit (and commit history), if they can find a collision.

gsu2 · on Dec 31, 2022

Git isn't relying on collision-resistance, it's relying on second-preimage[0] resistance, which is to say: in order to sneak a hash collision in to a git repository, you have to sneak _something else_ that's already trusted (e.g. via code review) into the repository; collisions can't (yet) be generated for arbitrary hashes.

I haven't heard of any second-preimage attacks against MD5, much less SHA-1, so mlindner was correct in asserting that MD5 would be fine (assuming 128 bits are enough). See also the analysis in [1].

More to the point, if you're able to sneak something into a repository in the first place (e.g. a benign file that generates a collision with a malicious file), then you're probably able to sneak in something more directly (e.g. [2]) that won't rely on both getting something in a trusted repository and then cloning from a different, untrusted source.

[0]: https://en.wikipedia.org/wiki/Preimage_attack

[1]: this is getting a bit old, but should still be relevant? https://electriccoin.co/blog/lessons-from-the-history-of-att...

[2]: https://en.wikipedia.org/wiki/IDN_homograph_attack

oconnor663 · on Dec 31, 2022

> if you're able to sneak something into a repository in the first place (e.g. a benign file that generates a collision with a malicious file), then you're probably able to sneak in something more directly

Could you imagine using an implementation of TLS that "probably" authenticated your network traffic though? I think there are two separate reasons we prefer to make strong guarantees in cryptography:

1. That's often really what I need. If I'm downloading e.g. software updates over the network, I really need those to be authentic.

2. Even when I arguably don't need strong authenticity, like just reading some news articles, I want to use the same strong tools, because I don't want to have to study and understand (much less teach) the situations where some weaker tool fails. Inevitably I'll get that wrong or just forget, and I'll end up using the weak tool in some case where I should've used the strong one.

In this case, if I imagine teaching how commit signing works with a weak hash function, it sounds like "Signing commits means that no one can sneak malicious content into your repository, unless they first steal your secret signing key, or else you ever committed (or allowed anyone else to commit) a non-text file that they created." Actually writing that second part out makes it feel really bad to me.

seba_dos1 · on Jan 1, 2023

> "Signing commits means that no one can sneak malicious content into your repository

Signing commits does not mean that even when using cryptographically secure hash function. All it means is that you put your signature over a particular state of the repo (and, by extensions, its parent states). It has nothing to do with preventing "sneaking things in" - although it could be a (small) part of the whole set of measures taken to prevent someone from doing that.

eru · on Jan 1, 2023

> All it means is that you put your signature over a particular state of the repo (and, by extensions, its parent states).

That's technically true. Though in practice I think the implied social contract is that signing of a commit means you signal some kind of approval for the diff between the signed commit and its immediate predecessor(s).

gsu2 · on Dec 31, 2022

I'm not 100% sure I understand your point, but it sounds like you're concerned about signing something using a weak hash function (i.e. where the hash of something is what actually gets signed)?

If that's the case, then my point is pretty simple: yes, SHA-1 is broken for signing untrusted input (due to weak collision resistance), but it is not broken (so far) for signing trusted input (due to strong preimage resistance).

My point earlier was primarily that the contents of a repository are generally trusted (via mechanisms like code review), and signing trusted content still works even with SHA-1.

Note that certificate signing vulnerabilities (which I assume is why TLS was mentioned?) usually rely on a malicious actor presenting one certificate and then presenting a different cert later; they can't arbitrarily fake existing certs from somebody else.

The analogous scenario for git repositories would be to have a malicious actor make a commit (or blob, tree, etc.) that could be swapped out for another. But if you already have malicious actors able to make commits in your repository, then the hash function doesn't matter: they can cause damage in many, many other ways.

eru · on Jan 1, 2023

> The analogous scenario for git repositories would be to have a malicious actor make a commit (or blob, tree, etc.) that could be swapped out for another. But if you already have malicious actors able to make commits in your repository, then the hash function doesn't matter: they can cause damage in many, many other ways.

The malicious actor can pose as a good-faith contributor and submit Pull Requests to your repository.

You review the code in the PR, and perhaps even prove it correct. Later on, the malicious actor can do the swapping trick. (Eg by running a mirroring service for your repository.)

gsu2 · on Jan 1, 2023

> You review the code in the PR, and perhaps even prove it correct. Later on, the malicious actor can do the swapping trick. (Eg by running a mirroring service for your repository.)

Having a copy of code that is reviewable and then searching for a malicious collision is a preimage attack; extending two chosen prefixes (e.g. one "valid" and one "malicious") until they meet at a hash collision is how most practical (?) collision attacks work. The latter scenario produces large junk sections in the results, which should be obvious under even mild scrutiny.

If the reviewer misses the kilobytes of garbage in the middle of a file they're reviewing, then an attacker can just sneak malicious code in directly without requiring a hash collision.

If the project relies on an effectively unreviewable binary file that could hold kilobytes of junk (like some YAML files I've seen...), then that's already breaking the review process without requiring a hash collision.

Ignoring all of that, anybody grabbing code from an untrusted source is already vulnerable to whatever attacks that untrusted source wants to employ, with "exploiting hash collision" being one of the higher-effort attacks that could be mounted.

Essentially, any repository that would be vulnerable to any of the known hash collision attacks (via bad review, untrusted upstream, etc.) would be vulnerable to more mundane, easier attacks against the same weaknesses that do not depend on hash collisions.

eru · on Jan 3, 2023

> Having a copy of code that is reviewable and then searching for a malicious collision is a preimage attack;

No, it's not. You can sneak extra entropy into minor formatting choices or variable names etc, or exactly what you write in your commit messages. Or probably even ordering of files in your directories. (I don't think the git protocol enforces that files have to be in eg alphabetical order.)

> Ignoring all of that, anybody grabbing code from an untrusted source is already vulnerable to whatever attacks that untrusted source wants to employ, with "exploiting hash collision" being one of the higher-effort attacks that could be mounted.

I'm not sure. If your hash works fine, as long as someone trusted gives you the commit hash, anyone untrusted can give you the actual source.

And if you mean accepting PRs: accepting PRs from the untrusted internet basically how open source works..

dagenix · on Dec 31, 2022

I don't believe that is accurate.

First - if git really didn't care about collision resistance, there wouldn't have been a need to switch to SHA1DC as the hash function. They switched because they care enough that they were willing to accept the performance penalty.

Second - imagine this scenario: a user creates two commits with the same hash, one with a valid change and the second with a malicious one. The collision could be created by playing around with some data in a binary file - so, this is a collision attack not 2nd pre-image. The user then submits the change to the upstream and gets it approved. The user maintains a mirror of the upstream repo into which they place the malicious commit. Anyone that pulls from this mirror will think they have the same code as the upstream, even if they compare hashes.

So don't use an untrusted mirror? I guess - but that is something that should be possible with a strong hash. And if git really didn't want you to do that, it would provide for better ways of tracking where objects were actually pulled from.

Anyway, collision attacks are real and can impact git. They just aren't as bad as a 2nd pre-image attack.

gsu2 · on Jan 1, 2023

> First - if git really didn't care about collision resistance, there wouldn't have been a need to switch to SHA1DC as the hash function. They switched because they care enough that they were willing to accept the performance penalty.

Git didn't _need_ to switch to SHA1DC, but they did because the cost was minimal and it's still a good idea to defend against known attacks.

> Second - imagine this scenario: a user creates two commits with the same hash, one with a valid change and the second with a malicious one. The collision could be created by playing around with some data in a binary file - so, this is a collision attack not 2nd pre-image. The user then submits the change to the upstream and gets it approved.

This is a general problem with binary files: they're hard to properly review. Having unreviewable files in a repository (binaries, machine-generated configs, etc.) is already a security problem; hash collisions would just be one (very difficult) way of exploiting that problem.

> The user maintains a mirror of the upstream repo into which they place the malicious commit. Anyone that pulls from this mirror will think they have the same code as the upstream, even if they compare hashes.

Having people pull data from an attacker-controlled source is a security issue, regardless of hash collisions.

> So don't use an untrusted mirror? I guess - but that is something that should be possible with a strong hash. And if git really didn't want you to do that, it would provide for better ways of tracking where objects were actually pulled from.

Git was designed for collaboration between trusted parties; collaboration between untrusted parties (e.g. pulling changes from untrusted sources) is a much harder problem that git doesn't pretend to solve.

> Anyway, collision attacks are real and can impact git. They just aren't as bad as a 2nd pre-image attack.

Collision attacks are real, but they have yet to impact git (beyond adopting SHA1DC, I guess), despite how big of a target popular git repositories are.

dagenix · on Jan 1, 2023

> Git didn't _need_ to switch to SHA1DC, but they did because the cost was minimal and it's still a good idea to defend against known attacks.

I'm confused with how a SHA1 collision being found is an "attack" if git truly doesn't care about collision resistance.

> This is a general problem with binary files: they're hard to properly review. Having unreviewable files in a repository (binaries, machine-generated configs, etc.) is already a security problem; hash collisions would just be one (very difficult) way of exploiting that problem.

I don't think you can ignore the use case - people do check binaries into git with the expectation that git will keep track of them.

> Git was designed for collaboration between trusted parties; collaboration between untrusted parties (e.g. pulling changes from untrusted sources) is a much harder problem that git doesn't pretend to solve.

Maybe that is how git was designed. But it's not how git is used. People do pull from repos that they don't fully trust. Maybe just to examine a change before throwing it away. What people don't expect is that by pulling from such a source that an unexpected file could get into their repository due to a collision attack. That is why git switched to SHA1DC - if git truly didn't support that use case, they wouldn't have needed to.

> Collision attacks are real, but they have yet to impact git (beyond adopting SHA1DC, I guess), despite how big of a target popular git repositories are.

I agree that collisions attacks are real but aren't a practical issue yet. What I was responding to was your comment:

> I haven't heard of any second-preimage attacks against MD5, much less SHA-1, so mlindner was correct in asserting that MD5 would be fine (assuming 128 bits are enough). See also the analysis in [1].

In that comment, it seems that you were saying that collisions attacks weren't a problem at all. But, it seems like you are saying in your more recent comment that "collision attacks are real"?

eru · on Jan 3, 2023

> This is a general problem with binary files: they're hard to properly review. Having unreviewable files in a repository (binaries, machine-generated configs, etc.) is already a security problem; hash collisions would just be one (very difficult) way of exploiting that problem.

That's not a problem in general. Eg having a binary bmp in your repository is fine as far as reviews go.

eru · on Jan 3, 2023

> Git was designed for collaboration between trusted parties; [...]

No.

Git was designed for development of the Linux kernel. Contributors to the Linux kernel are generally not trusted.

eru · on Jan 1, 2023

> Git isn't relying on collision-resistance, it's relying on second-preimage[0] resistance, which is to say: in order to sneak a hash collision in to a git repository, you have to sneak _something else_ that's already trusted (e.g. via code review) into the repository; collisions can't (yet) be generated for arbitrary hashes.

Yes, I know. I was arguing the more general point that 'The use of SHA-1 in Git is not for security purposes,'.

Of course, for anything crypto related we go by the maxim 'guilty, until proven innocent'. MD5 might not have a published second-preimage attack, yet; but its broken enough, that you shouldn't rely on it for anything anymore: it's not a acceptable crypto-hash, and if you don't need a crypto-hash, you can use something simpler like a CRC instead.

_flux · on Dec 31, 2022

Git allows one to sign commits. It would be quite pointless if it was signing mere MD5 hashes.

shp0ngle · on Dec 31, 2022

It’s exactly same pointless now.

You sign the hash, which is what’s colliding.

alerighi · on Dec 31, 2022

Finding a collision is very hard, not something you will do in minutes, it requires a tremendous amount of resources. For any practical use (like git) that doesn't require an extreme level of security SHA-1 is still fine, and it will be for a lot of years to come.

eru · on Dec 31, 2022

I'm not sure why git would require less security than almost any other application?

Control over what software runs is really important. If an attacker can get you to run different source code, especially if it looks like it's still signed by the people you trust to produce or review sources, would be a big deal.

pornel · on Dec 31, 2022

MD5 is not vulnerable to second preimage attack, so signing a repo that doesn’t already have attacker-controlled data specially crafted ahead of time, is perfectly safe.

Collision attack is not “hash is useless you can make up anything”, but a specific condition that breaks only some uses, not all.

You can generate a pair of files that hash to same value that you can’t control. You can’t make a new file that hashes to an existing hash.

ilyt · on Dec 31, 2022

...as opposed to signing SHA1 hashes ? Git signs commits, not the whole tree AFAIK

seba_dos1 · on Jan 1, 2023

For the record, a commit is the whole tree. Git does not store patches, they are generated on the fly by the UI.

Trees still rely on hashes to address the actual content though.

CorrectHorseBat · on Dec 31, 2022

Git signs commit hashes

sfe22 · on Dec 31, 2022

Why not add a check to see if the new hash is a collision? With an index I don’t think it would ever be a bottleneck.

mlindner · on Dec 31, 2022

They could do that, but it probably shouldn't be on as default and it doesn't actually achieve anything other than protecting from accidental addition of a colliding hash. It wouldn't offer any additional security benefit.

pfix · on Dec 31, 2022

https://fossil-scm.org/home/doc/trunk/www/hashpolicy.wiki

Fossil found a quick way solving this.

rst · on Dec 31, 2022

Looks like the git project has a different plan for gradual transition -- but one that's more elaborate and somewhat more fragile. Comment from one of the developers (from prior discussion of the same LWN article): https://news.ycombinator.com/item?id=31856133

fmajid · on Dec 31, 2022

Like SQLite, Fossil is an incredibly well-designed jewel of software craftsmanship. It’s unfortunate market dynamics led to git’s dominance instead.

mdaniel · on Dec 31, 2022

Market dynamics may be true, but the continued use of git has more to do with workflow than implementation. Fossil makes quite a few dogmatic choices and if one buys into them, then fantastic, but it's for sure not a drop-in replacement for the way teams develop software, and thus would require a teamwide "hard fork"

This debate comes up every time git and fossil are mentioned in each other's threads

openplatypus · on Dec 31, 2022

Small tangent, but what happened with the alternatives?

I hear that Mercurial is having a good time at Meta and Google, but hard to find reliable repo hosting.

The big three, Gitlab, GitHub and BitBucket are just git.

Baazar is Breezy now, but not sure how much life it has in it.

Darcs has peculiar distribution via Haskell package managers.

Are we "stuck" with git?

charcircuit · on Dec 31, 2022

Meta has Sapling https://sapling-scm.com/ which is also compatible with git servers.

Google uses Piper which is not public

lima · on Dec 31, 2022

Most of Google. Some projects like Android and Chromium are developed outside google3 and use Git (with Gerrit).

IIRC, Git is also relatively popular as a local Piper frontend.

loeg · on Dec 31, 2022

Mononoke/EdenFS (mentioned in Sapling's Github README) are Meta's in-house mercurial stack. The "Sapling" name is not used internally but "hg" is.

IshKebab · on Dec 31, 2022

Facebook have been the only major users of Mercurial for years. Sapling is basically their rewrite of it.

fleg · on Dec 31, 2022

FWIW, SourceHut offers Mercurial hosting.

openplatypus · on Dec 31, 2022

From the site: Notice: sr.ht is currently in alpha, and the quality of the service may reflect that.

:(

I miss Kiln from FogBugz (yes, it was years ago)

rnkn · on Dec 31, 2022

SourceHut is more reliable than GitHub. I mean this in the truest sense of the word, I can rely on SourceHut not to act against my interests, both ideologically and just basic usefulness.

dist1ll · on Dec 31, 2022

I thought so too at first, until the owner changed their TOS to forbid all crypto and blockchain related projects, essentially kicking me off their platform.

So no, it's not reliable. The platform is at the behest of a small group of ideologists, who might change their stance on any topic on a whim.

GitHub on the contrary continues to host code that has been OFAC sanctioned. I'd rather stay with them.

still_grokking · on Dec 31, 2022

Moment, what?

SourceHut bans customers purely on ideological grounds?

If true, this would change my opinion about this service diametrically. That would result in: Never ever make business with them.

nix23 · on Dec 31, 2022

https://sourcehut.org/blog/2022-10-31-tos-update-cryptocurre...

Yes, it's a really bad move, you pay for the service but the nature of your code is not welcomed.

Like you i was a bit shocked too...well and a bit sad since i was thinking that a hosting i pay for should give me more freedom and not less, and that code(knowledge?) should be free.

mariusor · on Dec 31, 2022

I think you're mixing up what "freedom" is in regards to open source.

Sourcehut's code gives a user the freedom to use it for whatever they want - including hosting their own crypto-currency projects.

sr.ht the "service" on the other hand is not required to do a thing and denying users the ability to host those projects doesn't contradict any licence the code has been released under.

still_grokking · on Dec 31, 2022

The point isn't any license.

The point is that's it's not the business of a hoster to decide what people may host.

A hoster gets money for hosting things. Ideally the hoster does not even know what he's hosting. (Until there is a problem with that that someone else points out to the hoster; which the hoster should than just ignore in case this someone isn't an authority with a valid court order in hands).

As a parallel: Just imagine your ISP would start to filter the web sites you may visit based on some arbitrary ideological believes. That's more or less the same to what's happening on SourceHut, imho.

mariusor · on Jan 1, 2023

I'm sorry but that's a terrible comparison. If internet providers would not be in the habit of snooping and filtering on their customer's traffic would we have debates about net neutrality, would we need HTTPS?, would VPNs be a thing?, would we need Tor?, would there be a Dark Web? Granted I'm over dramatizing the situation, but the fact is that internet providers are in fact snooping for themselves, or for law enforcement, denying customers the use of certain ports or protocols, injecting content into non-secure content, etc.

I can understand one being upset that sourcehut's policy changed "after" paying for an account, but you can just stop paying for the service and move to a different forge. Being butthurt that people have different principles than you is not cool.

still_grokking · on Jan 2, 2023

> I'm sorry but that's a terrible comparison.

Do you have any arguments that would bake this claim? Where's the difference?

> If internet providers would not be in the habit of snooping and filtering on their customer's traffic […]

What are you talking about? This does not happen as it would be illegal. At least in civilized countries.

(Given a court order for lawful interception there may be exceptions to that, of course).

> net neutrality

This term means something else.

> we need HTTPS

For other reasons.

One of them being rogue states that snoop on people's traffic. [Not looking in the direction of north America now].

> VPNs

That's similar to HTTPS.

Also it circumvents state level censoring, which is needed by now in quite some countries.

> Tor

That's even more in the direction of hiding form state surveillance.

Your ISP usually knows that you're using Tor…

> Dark Web

That's a very unclear term, btw. And it has nothing to do with anything an ISP does.

> but the fact is that internet providers are in fact snooping for themselves

Like I said: Not in civilized countries, as this would be a breach of the constitutional right to privacy of correspondence.

> law enforcement

That's a tangent. Everybody besides a culprit needs to cooperate with law enforcement.

> denying customers the use of certain ports or protocols

You could do this in theory. But you wouldn't be selling internet access anymore in this case. This would be like AOL or Compuserve back then.

nix23 · on Dec 31, 2022

>I think you're mixing up what "freedom" is in regards to open source.

I don't talk about opensource or any license, i talk about that when i pay for a service i can host any code i created for whatever use it is.

But talking about licenses (since you don't think it's one of the freedom's) that's exactly what definition of opensource means:

https://opensource.org/osd

>6. No Discrimination Against Fields of Endeavor

The license must not restrict anyone from making use of the program in a specific field of endeavor. For example, it may not restrict the program from being used in a business, or from being used for genetic research.

mariusor · on Jan 1, 2023

Yes, but the service is not the code. The license applies to the code. I can't believe you can't perceive that distinction.

And when you pay for a service, you can do whatever that service allows you to do, which in this case is "not" crypto-currency projects.

nix23 · on Jan 1, 2023

> I can't believe you can't perceive that distinction.

You had that misunderstanding not i.

>you can do whatever that service allows you to do

Yeah look i stop here if you think that's a good decision.

nmz · on Dec 31, 2022

I wouldn't blame them considering most crypto related things are scams. To protect the platform, its best that it just not be there. While github has the money to defend youtube-dl, the truth is the RIAA killed it.

If you're doing scammy things, stick to fossil and host it on your own.

kuschku · on Jan 3, 2023

SourceHut offers CI. People end up just using the CI for litecoin/bitcoin/eth/etc mining or the storage for chia mining.

dist1ll · on Jan 3, 2023

Wrong. See Drew DeVault's (founder of SourceHut) comment on exactly this topic:

> Q: How much of this is due to not wanting build/pipeline servers getting abused for mining purposes?

> A: None: the mining incidents stopped entirely when we started charging for CI and it stopped being profitable to do it.

https://news.ycombinator.com/item?id=33404713

As you can see, CI had nothing to do with their decision. The ToS changes specifically refer to source code hosting.

seanw444 · on Dec 31, 2022

Agreed. Moved all my stuff from GitHub to Sourcehut. Haven't looked back. Well, okay, I look at the trending repos and star some that are interesting, but I don't host my personal projects on there.

usr1106 · on Dec 31, 2022

What will come first? IPv4 retired or git SHA-1 repos retired?

oynqr · on Dec 31, 2022

The day of the Linux desktop.

mlindner · on Dec 31, 2022

Neither, as I don't see either ever going away at least within my lifetime (in my 30s).

nix23 · on Dec 31, 2022

I don't think IPv4 will ever retire.

c7b · on Dec 31, 2022

Could someone explain what exactly the attack vectors would be here? I always assumed the hashes were there just for collision resistance. It would seem that any attacks involving an attacker introducing malicious code through a collision would require a level of system access that would enable more dangerous attacks with less effort and/or less detectability. Curious to learn.

Dylan16807 · on Dec 31, 2022

A basic attack idea is getting someone to accept a patch, while poisoning other servers with an evil version of the patch with the same hash. Then some of the people updating will get the evil version, maybe almost everyone or maybe very specific targets.

Poisoning servers, in the most worrying case, just requires the ability to get your evil file into any commit on any branch, and possibly in any repo on the server if they have certain optimizations.

ilyt · on Dec 31, 2022

Well, various variants of spectre attacks were also thought as theoretical decades ago. That's the problem, you never can predict how crafty attacker might abuse the vector of attack.

boramalper · on Dec 31, 2022

Does someone know why SHA-3 isn't considered instead? Wouldn't its sponge construction—that allows outputting ("squeezing") any amount of data[0]—allow git to maintain backward-compatibility with existing tooling by continuing to use the same 20-byte output?

Is it that for d:=160, min(d/2,256) = 80 bits of collision resistance too low to justify the change?

[0] https://en.wikipedia.org/wiki/SHA-3

newpavlov · on Dec 31, 2022

One issue with SHA-3 is that it currently (unfortunately) lacks hardware acceleration support, while SHA-256 can be ridiculously fast on modern x86 and ARM chips. BLAKE3 is another potential alternative, it can be used as XOF and can be very fast without hardware support.

jillesvangurp · on Dec 31, 2022

Blake3 would be a good choice. Having a fast hashing algorithm is not a bad thing. Integrity checks involve a lot of hash calculations.

That's perhaps a reason Git has stuck with sha1 hashes. They are fast enough and good enough.

nmadden · on Dec 31, 2022

You could just truncate SHA-256 to 20 bytes for that matter.

boramalper · on Dec 31, 2022

Indeed, it's discussed here 6 months ago: https://news.ycombinator.com/item?id=31852651

edit: I've assumed that using SHA-3 would be somehow better but their security guarantees seem to be the same (yet).

greatgib · on Dec 31, 2022

Wouldn't it make sense to introduce a prefix to the hash for 256 bit hashs? For exemple a non hexa letter like z, so it becomes future proof and you can differentiate with the prefix?

orev · on Dec 31, 2022

The fact that this wasn’t done at the beginning is mind boggling. We already dealt with this problem with Unix passwords using crypt, which was solved exactly this way decades ago. This should be standard for any system using hash algorithms.

layer8 · on Dec 31, 2022

Hashes are binary values internally, so "z" (or it’s ASCII byte value) already exists as a prefix in SHA-1 hashes (any hash that begins with 7a). The real problem is that existing code assumes that hashes are always exactly 20 bytes in size, and any 20-byte value is a valid SHA-1 hash. There is no way to indicate a non-SHA-1 hash in the binary Git data formats and internal representations processed by existing code. The task is to change those formats and representations, and the related code, to enable such an indication.

rswail · on Jan 1, 2023

The major git repo hosts need to support it, and given their backing, have no excuse not to. So gitlab, github, bitbucket, aws codecommit et al need to be forced by corporates rejecting their use on security audit ISO27K/SOC2 grounds and they'll move.

EVa5I7bHFq9mnYK · on Dec 31, 2022

It is often declared that SHA-1 is broken, but in fact still nobody can take code A and find a different code B that hashes to the same SHA1 value, which is what git is concerned about. Even long ridiculed and buried MD5 is still perfectly secure in that sense.

HyperSane · on Dec 31, 2022

Baking SHA-1 so deeply into Git seems like a poor design decision.

layer8 · on Dec 31, 2022

Yes, algorithm agility was already a known concept when Git was developed, so it’s somewhat surprising that SHA-1 was hardcoded.