Why have so many people written long thoughtful explanations about how the author is wrong to suggest snapshots are a better mental model, and that you think all abstractions are leaky, but you find diffs a better mental model?
The entire article is literally about how commits are literally snapshots. I would say people didn't read TFA, but a lot of people are quoting lines from TFA and then going on to argue with/expand on them in a way that is directly contradicted by the next few lines.
I think it's because most of the people here have spent years working with git, and are so deeply attached to their understanding that they didn't hear most of what the article said.
(Some commentators have pointed out specific oversimplifications the author makes like glossing over pack files, I'm referring to the people who say a git blob is a diff when the entire point of TFA is that it isn't)
People are disagreeing with the author, not because they didn't necessarily read the article, but because they don't agree about how things should be defined.
At the root, this is a disagreement about semantics and philosophy, not about git itself. I'm going to refer to Aristotle here: we think we have knowledge of a thing only when we have grasped its cause, and there are four general 'causes' [1]:
- The material cause: 'What is it made of?'
- The formal cause: 'What is the ideal of this thing?' , e.g. what's its abstract nature?
- The efficient cause: 'How did this thing come to be?'
- The final cause: 'What is its purpose?' How is it actually used? What role does it play in the world?
Here we can see that commits are used (at least in the git internals) as 'snapshots' — they refer to bytes, not changes in bytes. That's pretty close to the formal and efficient causes — the abstraction inside of git is closest to a snapshot, and that comes from the history of what Linus wanted when he wrote it.
But! The underlying storage uses deltas (which are diffs) to save space. That's the material cause.
But also, when we actually use commits, git often creates diffs for us as a convenience (cherry-picking, rebasing), and hides the fact that they're snapshots under the hood (final cause).
So there's an inherent tension between the different ways to answer 'what is a thing?'. For commits, this is especially bad, since there's an even split between 'causes'.
This tension never goes away because the most useful definition really depends on the context.
> But! The underlying storage uses deltas (which are diffs) to save space. That's the material cause.
This does not make the "commits are stored as diffs" story much more true:
1. This is only true of pack files, but pack files are only created once the repository exceeds a certain size.
2. Nothing about the pack file format requires that deltas follow the chronology of commits at all. The deltas could be stored in reverse order or even random order compared to the chain of commits.
3. The deltas in a pack file do not correspond to a change in a given commit, they are just the data to create a particular snapshot. If you find that a commit's file blob is stored in a pack file as a delta, that does not tell you anything about whether the file changed in that particular commit. You have to look at two commits and diff them to determine which files actually changed.
If a person wants to think about version control in an abstract way, then yes the two views (commits vs diffs) are somewhat interchangeable. If a person wants to understand what actually happens when you run a Git command, the answer to that question is less open to interpretation.
> The underlying storage uses deltas (which are diffs) to save space.
Not necessarily! The base git storage stores each object individually, not as deltas ("disk space is cheap"); it's only after a "git gc" that they are stored as deltas to other (potentially unrelated) objects. The original implementation of git didn't even have the delta storage (pack files), it was added later as an optional optimization.
So answering to "what it's made of?" with "deltas" comes with a huge caveat, that it's often partially or completely untrue.
This is exactly what I'm talking about. A person posts "this is literally how this works", and someone replies "philosophically I would prefer to think it works differently, therefore you're wrong".
> Why have so many people written long thoughtful explanations about how the author is wrong to suggest snapshots are a better mental model, and that you think all abstractions are leaky, but you find diffs a better mental model?
Probably because, to take their words at face value, they find diffs a better mental model? I think impugning "people [...] are so deeply attached to their understanding that they didn't hear most of what the article said" is a real bad faith reading, especially when you even acknowledge that central to people's arguments is "all mental models are leaky". This article may be technically correct about the way git internals are structured, but it makes cherry-pick and rebase more mentally complex for users to understand (you first have to go from commit => patch), not less.
Saying "Commits are collections of files + a parent commit, but you can diff it to generate a patch" and saying "Commits are a patch + a parent commit, and you can apply it to generate a collection of files" are isomorphic mental models—the fact that #1 is "correct" (for some value of correct that doesn't include the actual files stored on disk) is really besides the point.
My point is that people criticizing TFA's proposed mental model are missing the fact that TFA doesn't propose a mental model, it explains how things work. Both have value, but they're distinct.
I disagree. TFA is explaining the mental model Git uses to structure their codebase. If you're writing code for Git, this is obviously very useful to understand, but if you're just using it, this is only one of several mental models available to you. In this case, I think it's right to say that the distinction the author is attempting to draw is immaterial to those not working on the Git codebase.
Yes! It just seems so strange not to care about how things actually are in software. Is it a way of coping with the fact that so much software is so deeply layered and complex now?
Maybe I’m misremembering, but I feel like I didn’t see this usage of “mental model” much until fairly recently. The first I recall being surprised at was a discussion of a “mental model of Javascript” -- why would you need a mental model of something with a very detailed spec and multiple compatible implementations to study? If you want to understand how some aspect works, just look up how it actually does work.
Well, sometimes the API of a piece of software presents one model, while the implementation actually uses a different model underneath for various reasons.
In particular in Git, some commands expose the commits-as-diffs model (cherry-pick, rebase) while others present the commits-as-snapshots model (checkout). However, if you were to look at various layers of git code, the model is either commits-as-snapshots, or neither (compressed storage).
You could also theoretically change the entire implementation of git to store commits as diffs, and offer the exact same API as it does today (probably with differences in the way conflicts are resolved, and definitely with differences in performance).
It's necessary to approximate, but if someone tells you your approximation is wrong it makes no sense to say it's right because you prefer it that way.
If your mental model is that floats are real numbers and someone tells you they aren't, you don't go "I philosophically prefer to think they're reals, so you're wrong". You either update your mental model or decide you'd rather be a bit wrong than learn something (you perceive as) tedious.
Agreed, commits are snapshots, whether we like or not. For obvious storage efficiency reasons, the implementation then diffs/packs/etc, but this is a different issue altogether.
I have found that I can't work with git with a different mental model (diffs). Every time things get messy, the diff model is not enough, whereas snapshots + commit graph + names/pointers make things natural.
Interestingly enough, when migrating people from svn to git, explaining the actual model makes the transition much smoother, so it would seem I'm not the only one.
I’ve read the article before and it’s entirely unclear how it is supposed to be helpful. As the author acknowledges, things like cherry-pick show that one can think of commits as diffs whereas in the git implementation the object known as a commit is a snapshot of the state of a directory tree. Fine, but so what? Both times I’ve read this article my impression has been that the author is relatively new to git and processing some new information they’ve learned.
> Why have so many people written long thoughtful explanations about how the author is wrong to suggest snapshots are a better mental model, and that you think all abstractions are leaky, but you find diffs a better mental model?
Once you remember (learn?) that a commit can have N parents, it becomes apparent that it cannot be a single diff.
I suspect the chronology is something like RTFM -> TFM -> RTFA -> TFA, but the second and third might be switched. Dropping the R does introduce obscurity, but being able to convey the underlying sentiment (that while the content could/should have been consulted, it seems as though it was not) without a verb allows for a nonconfrontational syntax similar to passive voice, but even moreso, and often without obvious "weasel" effect, to boot!
Makes a lot of sense, thanks! Maybe it's also useful since the R is read explicitly as "read". Hence "they should instead be RTFM" sounds grammatically wrong. Breaking off the verb allows for a more natural read. It's funny how an abbreviation can carry more information than whatever it's short for.
Interesting, I didn't know it came from Slashdot! I spent quite a bit of time reading it in the early 2000s, and sometimes miss the subculture and not always subtle jokes (beowulf clusters... ). The moderation system encouraged jokes (there was a specific 'funny' tag), unlike HN which does not 'orient' things, and happens to be very serious.
The entire article is literally about how commits are literally snapshots. I would say people didn't read TFA, but a lot of people are quoting lines from TFA and then going on to argue with/expand on them in a way that is directly contradicted by the next few lines.
I think it's because most of the people here have spent years working with git, and are so deeply attached to their understanding that they didn't hear most of what the article said.
(Some commentators have pointed out specific oversimplifications the author makes like glossing over pack files, I'm referring to the people who say a git blob is a diff when the entire point of TFA is that it isn't)