> > So I the identity of a patch doesn't depend on history? > with no history/an...

fiddlerwoaroof · on Sept 26, 2020

The cryptographic guarantee that a commit with the same identity represents the same codebase state is the killer feature of git: it’s great to be able to trust the history of a repository that way.

a1369209993 · on Sept 26, 2020

> The cryptographic guarantee that a commit with the same identity represents the same codebase state is the killer feature of git

Um, no? I don't think anyone in the history of git has ever reasoned "Well, I'd like to use this other VCS, but git is really useful for my use case of working with co-developers who are so untrustworthy that I need cryptographic assurance they haven't tampered with the history of the repository, but who I nonetheless trust not to inject Underhander C Contest[0]-style bugs in the code or otherwise deliberately sabotage things that aren't the history.".

The killer feature of git is operating on the repository and its history as a graph structure, which works even if commit ids are completely non-cryptographic GUIDs. (In the single-user case you could even use sequential integers, but that doesn't scale.) Cryptographic assurance is nice to have (dumb mistakes and unexpected malice happen to everyone), but it's worth trading away if the resulting features are good enough to justify the trade-off.

0: http://underhanded-c.org/

whatevah5982 · on Sept 26, 2020

As a single developer or a team working in the same organization you can ignore this property, however in a distributed VCS it is pretty much a defining and essential feature.

Guaranteeing that the history is the same is paramount to ensure that the same operation on two repositories which are _supposed_ to be identical will yield the same results.

This is what allows you to merge back history from a forked repository over which you have no control with confidence.

You don't necessarily need that property, true. You also don't necessarily need to make it mandatory. However if you want to play in the distributed VCS game, you'd better have something equivalent that can give you the same guarantee.

a1369209993 · on Sept 26, 2020

> As a single developer or a team working in the same organization you can ignore this property,

That was my point; by fiddlerwoaroof's logic there would be no compelling reason for single developers or cooperative teams to use git (besides cargo-culting the linux kernel devs). But in fact there is such a reason - the graph structure I mentioned, or rather the sophisticated operations based on that structure.

> This is what allows you to merge back history from a forked repository over which you have no control with confidence.

When merging commits from a source repo into a destination, for each commit, either:

a, it doesn't already exist in the destination, in which case you have no way of knowing that the previous-commit data is correct, because while tampering would change the commit hash, you don't know what the old hash was because you've never seen the commit before.

b, it does already exist in the destination, in which case you have a perfectly good history for it already and (assuming the source disagrees, otherwise you'd just always ignore it) can either simply ignore the source's idea of where it came from, or (probably more usefully, but it depends on how you're organizing things) alert the user that they have two conflicting claims about the history of the commit, and ask for help the same as any other not-auto-resolvable merge conflict.

Git effectively treats b(≠) as "tell the user the source repository is horribly broken because its commit hashes don't match their content", but it fundamentally can't give you confidence about the parts of "history from a forked repository over which you have no control" that it hasn't seen, and including history in commit hashes isn't necessary for noticing that the parts it has already seen before don't match up.

fiddlerwoaroof · on Sept 26, 2020

I never said it’s the only reason to use git. However, it makes things like SOX change control easier because you can pass the commit ID around as a shorthand for a known state of a repository that is, practically, a UUID.

Similarly, as a single developer, it means I can verify repository backups by checking that all the repositories have the same branches and that the head of each branch has the same commit ID.

whatevah5982 · on Sept 27, 2020

> a, it doesn't already exist in the destination, in which case you have no way of knowing that the previous-commit data is correct, because while tampering would change the commit hash, you don't know what the old hash was because you've never seen the commit before.

In a fork we validate the common ancestry. This is used so that you can diff from the last known commit to the current point in absolute terms instead of relative.

> b, it does already exist in the destination, in which case you have a perfectly good history for it already

Unless it has been rewritten. Which, in git, we know can also happen by mistake. In such cases we can argue that you can diff the entire source tree and narrow it down the point where you believe the history path diverges and spot the changes youself, but git makes it a little bit more convenient.

kkarakk · on Sept 26, 2020

so in your theoretical example you'll never work with junior devs who make mistakes and screw up the history/trustworthiness of the repo? how do you recover from that?