> the identity of a commit should be defined by a hash of its contents only, but...

dataflow · on Oct 30, 2019

I'm most definitely not suggesting we should be using patches instead of commits though. I don't want anything to be logically composed of patches at all. (Physical-storage-wise, they can go wild; I don't care.)

cannam · on Oct 30, 2019

Ah, I misunderstood you. I thought you were asking for the identity of a commit to be the identity of the contents of that commit, i.e. the changes - but it seems you're talking about the contents of the working tree at the moment of the commit, with no dependency on prior history.

dataflow · on Oct 30, 2019

The thing is, the contents of a commit aren't patches. They are snapshots of the worktree. Your mental model is wrong (sorry), that's why you misunderstood. :-)

This is a common misconception that is corrected in many blog posts and tutorials; it's also explained clearly in the documentation. See the section (aptly) titled "Snapshots, Not Differences", where it says [1]:

"The major difference between Git and any other VCS (Subversion and friends included) is the way Git thinks about its data. Conceptually, most other systems store information as a list of file-based changes. These systems (CVS, Subversion, Perforce, Bazaar, and so on) think of the information they keep as a set of files and the changes made to each file over time. [...] Git doesn’t think of or store its data this way. Instead, Git thinks of its data more like a set of snapshots of a mini filesystem. Every time you commit, or save the state of your project in Git, it basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot."

Now of course as an implementation detail it only stores diffs based on existing blobs, but except for the obvious speed difference, this fact is completely irrelevant to you as a user. You neither know nor care how it is actually storing its commits. And the thing is, even if you looked underneath, you would have absolutely no guarantee that the blobs are physically stored as diffs against the parent commits. They might be stored as diffs against other random blobs the repo for efficiency, and the user would be none the wiser.

[1] https://git-scm.com/book/en/v1/Getting-Started-Git-Basics#Sn...

cannam · on Oct 30, 2019

> The thing is, the contents of a commit aren't patches. They are snapshots of the worktree

I know that is true of Git - that's why I thought that that was what you were complaining about.

feanaro · on Oct 30, 2019

What's the difference, though? How are patches different from commuting commits? By commuting I mean commits that do not depend on their position in history.

dataflow · on Oct 30, 2019

I don't know where you got the commutativity requirement from. It certainly wasn't something I demanded of commits.

feanaro · on Oct 30, 2019

> - [Major] I feel (but could be potentially convinced otherwise?) that there is one very deep fundamental flaw in the semantic model, and that is the fact that the identity of a commit depends on its history.

Commits whose identity does not depend on their position in history are commits that are commutative (with respect to their position in history). So you very much said so, but we obviously appear to be talking about different things. I'm at a loss as to where these things differ.

dataflow · on Oct 30, 2019

What? This is like saying you and your younger brother are commutative. It makes no sense. Commits are snapshots, not diffs. i.e. they're variables, not operations. i.e. they're verbs, not nouns. They're as commutative as you and I are.

feanaro · on Oct 31, 2019

Oh, I see what you're saying now, I think. You're arguing for commits to completely lose any relationship with one another by default while remaining simple snapshots. I didn't realize this at first since I fail to see the immediate utility of this.

I agree the concept of a standalone snapshot is useful, but I don't think snapshots are the right abstraction when thinking about the evolution of a codebase from a human perspective and consider changes the more important concept.

dataflow · on Oct 31, 2019

I mean, the idea that commits are diffs is a (common) misconception about git, likely carried over from another VCS. The snapshot model is the current abstraction; I haven't added any idea of my own here here. It's right there in the documentation: https://git-scm.com/book/en/v2/Getting-Started-What-is-Git%3...

But I never said changes aren't important and should be neglected. And I'm also not saying there shouldn't be any relationship about commits' relationships to each other. You certainly can and should record and utilize that information as well. It just shouldn't be part of that commit. (Except maybe if it's a merge commit, in which case the contents of the previous file system snapshot are relevant. But even there, that shouldn't include the hash, which represents all the ancestors.) Information about commits' relationships should be external information, whose manipulation won't suddenly alter the commits or their identities themselves.

This isn't a radical proposal or something. For starters, git's own documentation (which I already linked here) literally say "Git thinks of its data more like a series of snapshots of a miniature filesystem". Well, the snapshot of the file system doesn't include the history of how it came into creation, so doesn't that mean that shouldn't be part of your commit? That's already a contradiction with its own principles right there. And going beyond that, most things we do with commits already revolve around the file system snapshots, not the history. e.g. when you sign a commit, you sign the snapshot, not the history. Or when you say this guy is the "committer", you're just talking about the snapshot, not the history. And when a commit gets inserted into the middle of the history, that logically doesn't affect you, and in practice, you don't want it to trash the commit you're one. The identity of your commit is still the same after all -- it's the same snapshot.