Hacker News new | past | comments | ask | show | jobs | submit login
Git branches are named sequences of commits (plover.com)
170 points by EamonnMR on Feb 28, 2023 | hide | past | favorite | 224 comments



I am glad no one told me that because I would have caused me a lot of confusion.

Git branches, the thing the "git branch" give you are pointers to a commit, or refs if you wish. It has several implications: you can move them at will, and after a merge, you lose track of what "sequence of commits" it originally was.

I came to git from mercurial, and while they are fundamentally very similar, branches are a major difference because in mercurial, branches are really a named sequence of commits. Each commit of the branch is tagged with the branch name, and it will stay that way, even after a merge. You can't change or delete branches, only append to them or close them.

Git-style branches also exist in mercurial and they are called bookmarks, so maybe you can call them that.

Now there is the higher concept of branching that includes cloning and is not specific to git, and we can indeed call a branch a sequence of commits, but "git branches" have a precise meaning, and it is not that. Anyways, git is terrible at naming things, especially when it comes to its command line, one of the biggest flaws of an otherwise great version control system.


Maybe the conclusion then is that when "higher concept of branching" means something, and other tools like mercurial align with that meaning, it is git that is wrong by using a different meaning.

Git could have called their branches bookmarks too.

If Bob insists on calling vodka "Bob water", and someone asks for some water to put out a fire and Bob gives them vodka, there is a problem even if "Bob water" has a precise meaning.


The comments in this thread are very interestingly making the Author's point.

Yes, you can be "technically correct" saying branches are just refs, but it's not a useful statement for most users.

I believe the author makes a very valid point, and we could do with a bit less "technically correct" and more with language targeting the usage rather than technical implementation.

Git is confusing enough for many people, and we don't have to make it more confusing for them.


I wonder if this has more to do with how people visualise commits than branches per se. I think the git UI by default often encourages users to think of commits as diffs. A branch then is a bundle of diffs that, when stacked together, produces the current state of the repository. Internally, git uses some sort of optimisation to make calculating that current state quicker. In this mental model, it doesn't really make sense to talk about a pointer to a specific commit, because a commit without the rest of the information that makes up its branch is useless.

The problem is that this mental model isn't that useful in the first place, and often leads to confusion. Instead, it's usually easier to think of each commit as a complete snapshot of the entire codebase, that includes a link to the previous complete commit that it was made from (which in turn contains a link to the previous commit, and so on). In this scenario, a branch is just a pointer to a given commit - that's pretty much the easiest way to think about it - and the commit itself is a stack of history. Internally, git optimises for compression by removing redundant information between different snapshots.

In this mental model, thinking about branches as just a different type of tag is easier than thinking about it as a stack of commits, because each commit is already a stack of commits. Moreover, I think the snapshot model often ends up being clearer and easier to use overall. All you need is the basic concepts of snapshots, a linked list, and pointers, and the whole thing kind of just falls into place.


A ‘series of snapshots’ vs a ‘series of diffs’ are just duals of one another.

Since you can go back and forth between them at will, it seems odd to claim that one perspective is inherently superior. Like insisting that a chess board is actually a white board with 32 black squares on it.


It doesn't seem odd to claim that one of them is on average easier for beginners to use for intuition than the other, and that introducing both models simultaneously may bring more confusion than clarity.

I don't see claims of inherent superiority or correctness. It's about what's useful for education.

Nobody is arguing that you need to adopt a new mental model if what you have works for you.


There are some subtle differences in whether the canonical representation is a snapshot of the current state of the repo or a patch applied to the current state. One simple example is reordering; assuming your modifications don't change the same line, reordering patches arbitrarily won't change the final result. If you instead store snapshots of the state at each point, then reordering the snapshots won't necessarily result in the same final state, since you might have moved a different state to the final position.

You're correct that the two models are equivalent, but version control is about operations that you perform on the models, and those operations will not be the same for both models. You can reason about your git history as if its a series of patches, but git itself doesn't know how to deal with any model other than snapshots.


Given that git commits are immutable, reordering doesn’t matter. Any history rewriting involves creating new commits - whether those are new snapshots or new diffs.


The snapshot model is the correct model. How data is optimized via compression techniques is secondary. Thinking in terms of "diffs" is incorrect.


Targeting the usage is exactly what makes git confusing to people. You can start using git just by learning when to type "git add", "git commit", "git push" and "git pull" and you'll manage to collaborate somehow, but it will all fall apart the first time you stumble upon an unfamiliar situation. And because Git's UX isn't great, it's pretty hard to create a right mental model just by using it and inferring from the interface.

If you start by creating a mental model, the confusion goes away. Reminding people that "branch is just a ref" is just a way to push them towards less confusion.


Honestly, I think that if "git lol" was the default log command it would do the most to make things much more obvious to newcomers.

  git log --oneline --graph
And git lola for a gestalt of the repo's recent state:

  git log --oneline --graph --all


Just think about how useless and confusing the GitHub's history view is as soon as a merge is involved. Countless times I pulled something from there just to browse the graph because of how unhelpful the web UI is.


For how popular it is, GitHub really sucks: internet has simply miserable ways to visualize commits :/.


I'm still looking for a tool which produces a timeline similar to Fossil's, but for git.

Example Fossil graph: https://chiselapp.com/user/rkeene/repository/kitcreator/time...


I'm probably missing the details but doesn't most git GUI show a timeline like that? Such as the official desktop version and official (?) gitlens extension in VSCode? I don't use them myself though so I might be wrong.


gitup and other git clients for example do this: https://gitup.co/


I can't try gitup since it seems to require macOS -- ideally something web-based similar to Fossil.


If you do that ^ a lot then look at the `tig` tool which is that with an ncurses ui (and some more features)


VSCode users may enjoy the extension "Git Graph."


This, so much this. Lots of times where I've seemed like a git whiz, it's really just that I've got a marginally better understanding of how git really works. Git is much easier to use when you wrap your head around how commits and branches are internally represented.


I think there's some confusion around the meaning of "internally represented" seen in this thread. I wouldn't really call it "internal representation", as then people complain that a tool shouldn't make them learn its implementation details - and they're right, but that's not what happens here.

You don't have to learn how git-the-tool represents things internally. However, you absolutely should learn how git-the-model-of-a-repository represents things, because that's what you're operating on. Git is a tool to manipulate repositories, just like LibreOffice is a tool to manipulate documents. You don't need to learn how ODF stores things in zipped XMLs (just like you don't have to learn how git stores things in its content-addressable filesystem), but you need to understand what paragraphs, words, pages or slides are as this is the model you're working on (just like you need to understand what commits, branches and refs are and how they form a graph).

Unlike LibreOffice, git doesn't make it easy to understand its model just by using it (you could even say that it actively misguides you, although it has good reasons to do so), so you usually have to read some docs to grasp it.


I don't think git is unusual in that regard, and I think those complaints would be unjustified. Loads of tools work totally fine with only the barest understanding of how to use them, occasionally have problems that require a bit more understanding of their internal model, and even more rarely require deep knowledge of their internal model. I think most development teams would be totally fine with only a single member who has a slightly better understanding of the internal model. That knowledge only comes into play very rarely for me. If nobody were available with that knowledge, those teams could make do by simply copying the work into unmanaged text files for a few minutes and then just "manually" override the botched merge that got them into trouble.


Yes, I think it's a common case of getting the fact right, viz. branch == just a ref (true), but the understanding wrong, viz. ref == commit (false).

A commit is an immutable object. Whereas a ref is a pointer, literally a place on disk (a regular file) that holds an address (plaintext SHA) to the latest point in a logical chain of commits.

Meta remark:

This is also what makes it ok to delete a branch after it is "done", and why it is ok to merge a standard working branch (like "develop") repeatedly into a target branch (like release/master/main/trunk).

The semantic / meaning of a branch is transient. It is mutable conceptually and literally.

edit: formatting


Sort of an aside, I find it funny that the reflog is named that and not commitlog. With this mental model when you look at the reflog you usually want to get back an immutable commit because you've lost the ref. I know it displays the commits and the refs, but does anyone actually look at the reflog and checkout HEAD@{6} or do they use the commit sha?


reflog it a tool to show you the history of a given ref - if you don't give it any, it defaults to HEAD. It seems to me like "reflog" is the perfect name for it and I don't see how "commitlog" would be relevant to what it does.

Did you confuse "refs" (references) with "revs" (revisions)?


No, I was going based off the mental model I replied to:

> A commit is an immutable object. Whereas a ref is a pointer, literally a place on disk (a regular file) that holds an address (plaintext SHA) to the latest point in a logical chain of commits.

Reflog shows you the immutable commit SHA and the HEAD@{N} ref. I've only ever used it to get back to a commit I've lost, never by ref, so to me it's a commitlog.


HEAD is a ref just like any other. What you're looking at after typing `git reflog` is the history of things HEAD has pointed to - it's HEAD's log. Refs don't necessarily have to point to commits, they can point to other objects too.

HEAD@{<N>} is not a ref - it's a rev in <ref>@{<N>} form that means "N positions back in ref's history" (see `man gitrevisions` for more rev forms).

> never by ref

When you look at reflog's output, you've already dereferenced these commits by the given ref and its history.

Try `git reflog <branchname>`.


Yes, I've done that, because the reflog keeps more than just commits. It also keeps checkouts, merges, steps in rebasing, etc. So I've checked out a HEAD@ ref, when I made a mistake in merging or rebasing.


> Git is confusing enough for many people, and we don't have to make it more confusing for them.

Using terms with the wrong definition, and not precisely defining concepts, makes things more confusing, not less.


I'd argue git is confusing for many people because they don't understand the data model. The solution is to learn the basic data model instead of pretending that "a branch is a ref" is not true. Because it is true.


A branch is not a ref.

A head ref is a ref that names a branch. But branches can exist in git without refs. Branches are artifacts that exist in the commit DAG - they are dangling chains of commits that end without being merged in to some other commit. They exist, as pure platonic branches, even if they are un-referenced.

But then you can make a head ref and name one and now all of a sudden you have a named branch. As you make more commits that extend the branch while ‘attached’ to that head, the head ref follows the tip of the branch (that is in particular a thing a head ref does that a tag ref does not).

But you can add commits and extend a branch in a detached state of you like - no head refs following the branch tip. Yet the branch definitely exists. And then if you tag it, you name it.

So no, I don’t think “a branch is a ref” tells the whole story.


This is a strange take, in my opinion. Dangling commits like those you describe will be cleaned up by the the garbage collector. To say that a “branch” exists without a branch ref pointing to it is at best purely pedantic. Without a ref there is no meaningful branch because it will disappear eventually.


For anyone reading this who would like to learn about the data model, I highly recommend following along the "gitcore-tutorial" manpage. Like actually type the commands and play around with the results. Once you understand what's going on under the hood, the UI commands all make intuitive sense.


The author and people who insist on teaching (just) that "a branch is just a ref" fighting over the wrong point. The important part is to understand that each commit is itself both a complete snapshot of the repository and a sequence of commits that led to that snapshot (or more correctly, that it doesn't make sense to think of a commit without thinking about its pointer to the parent commit). That seems weird first, but everyone who understands how git works has internalized that, whether they explicitly think about it or not. After you understand that, it becomes easy to see that both the "technically correct" point and the author's point are kind of equivalent ways of saying the same thing.

But without this understanding, being told "branches are named sequences of commits" is probably worse than being told "a branch is nothing but a ref". The second one is cryptic and will soon be forgotten, no harm done. The first one leads you into a false sense of understanding, and soon you'll see an operation that looks like deep magic: someone moves the branch ref and now suddenly the whole branch is a completely different sequence of commits.

The confusion experienced by many people is largely due to the fact that a lot of articles try to teach git in a way that gives a false sense of understanding without explaining how git really works, which is exactly what teaching "Branches are named sequences of commits" does.


While I agree the title in a vacuum has the ability to mislead, the article itself is a critical piece, not a tutorial for beginners. I don’t think much harm was done here.


It's interesting how I strongly disagreed with you before reading the linked blog post, but I fully agree with you after reading it.

I still rather think of "git branches" as the technically correct "just refs" and hold the separate-but-related human-only concept of "development branch" in my head. I don't think there's a better approximation of truth than that, nor do I think it's that more complex to understand.

But the fact these two concepts share the same name truly is confusing. One would do better to refer to the former as "bookmark" or "branch tip".


I'd argue that in the long run, thinking of branches as the entirety of the history before a commit causes more confusion. I'd propose that the most useful way to convey the idea of branches to new git users is to start with the concept that every commit after the initial one has one predecessor, which means that you can always trace back the history of a commit by following the predecessors back to the initial commit, and then introduce the idea of a branch as a name that refers to a given commit. Combining these two ideas means that for any commit, you can definitively state whether or not it exists in the history of the commit that the name points to, and that commits that are part of that history are conceptually considered to be "in" the branch. Then you can introduce the idea that you can "update" the commit that a branch points to, and that the only way to add a new commit is to "increment" a branch to point to a new commit after the current one it points to.

This establishes enough information for you to show how using a git repo actually works; at any given time, you're looking at one specific commit, either directly or via a branch's name. If you're using a branch, then committing will perform the "increment" discussed earlier, with the branch now pointing to that new commit. Showing how to create a new branch will naturally lead to the discussion about how you can have two branches pointing to the same commit; this lets you explain that adding a new commit without specifying a branch name can be ambiguous, which you can demonstrate by checking out the current commit directly rather than by a branch name. Once you've shown that adding a commit requires either checking out one of the branches you have that point to that commit or creating a new one, you can show that the same principle holds for any other commit in the repo as well, even ones further back in the history with no branch currently pointing to them. You can use this opportunity to introduce the concept of `HEAD` as the unique name for whichever commit you're currently looking at, and that looking at a commit directly rather than via a branch is called having a "detached `HEAD`", which means that you won't be able to make any changes without creating a branch at that point first and "reattaching `HEAD`" to that new branch.

If you're trying to teach git to someone who hasn't yet learned the equivalent of an intro to data structures class in computer science, it might be worth simplifying the concept of branches in the way you describe. If you're teaching someone who already understands what a tree is, you're doing them a disservice by trying to hide the model from them because they have more than enough to understand what a branch actually is.


This was true back in Subversion too!

> Creating a branch is the same as creating a tag

> Tags merely exist to pinpoint a specific repository revision


It seems to me that people are confusing the CM concept of a branch with the way that git has chosen to implement it.


There is no single CM concept of a branch.


I wish we had a version control system that can be used without worrying, or even knowing, about its technicalities and implementation details.

Working with Git for version control is as if your photo management tool required you to learn about inodes and b-tree superblocks in order to save a JPEG file. I just want to keep source code history, and allow multiple people to collaborate on the same project. I don't want to know anything about "refs" or whatever else is happening behind the scenes, yet it appears Git can't be used unless you are (at least occasionally) willing to look at the plumbing layer.


Linus built an incredibly elegant and simple underlying model for git. For what it successfully does - distributed version control - it is remarkably simple and easy to grasp if you want to.

However, this model was not mapped well to the high level concepts that the typical user of a VCS operates in. This is the biggest issue of git: it's hard to make sense of it by its UI if you do not understand how it works under the hood. I struggled until I read the pro git book.

I wouldn't go as far as to compare this to knowing about filesystem data structures for saving a jpeg file. It's more like using an old school file dialog where you just see the bare file system and you need to know your way around your drive.


Git's other (compounding) problem is how the CLI is an inconsistent mess.

Why do you create a branch via the "git checkout" command? Why do you delete tags using "git tag -d" but delete stashes using "git stash drop"? If you want to blow away local uncommitted changes, you can use "git reset", "git reset --hard" or "git checkout (file)" - which (I think) all do totally different things.

Git's data model may be elegant, but its hard to appreciate it through the tangled mess of git command line options.


I know what you mean (deleting a branch in a remote is a very unintuitive syntax to me at first glance, especially), but these are perhaps not the best examples:

> Why do you create a branch via the "git checkout" command?

That's a shortcut for "git branch (name)", then "git checkout (name)". Or the newer "switch" which is more obvious.

> Why do you delete tags using "git tag -d" but delete stashes using "git stash drop"?

Because the stash is more like a stack, and tags are not, so "drop" without a parameter is a valid and very usual command. Yes, it feels inconsistent, but allowing "git stash -d" without a parameter would probably not be better.

> If you want to blow away local uncommitted changes, you can use "git reset", "git reset --hard" or "git checkout (file)" - which (I think) all do totally different things.

These do all do different things, so that's why they all exist. "git restore (file)" was introduced a few years ago (with "switch", mentioned earlier) to make the last one more obvious, since that's indeed always been an uncomfortable syntax for a core operation.

Git's a very powerful tool, originally aimed at a very complex code base run by experts, and was written very quickly as an emergency replacement for BitWarden. This rushed development and target audience does show through even today, but it's being annealed over time. Nevertheless, it's so good at what it does that it's taken out nearly every other VCS by just existing (ok, and the network effects of GitHub, but they choose it for a reason too).


BitKeeper was the version control system.

Bitwarden is the password manager.


> Why do you create a branch via the "git checkout" command?

Now there is also `git switch --create` / `git switch -c` for this.

Perhaps in time, there will appear different front end dialects for git. Like the statistics programming language R has the base R language, data.table dialect, and the tidyverse dialect.


There is also `git restore` for reverting local or staged changes, the other common use of `git checkout`.

I am slowly remapping my keystroke muscle memory away from the footgun that is `git checkout` and using restore/switch. But boy is it hard to undo a decade of practice.


I use Git daily and literally couldn't tell you how to use it because I've aliased every single command it has. I basically have a wrapper over it and to teach anyone how to use Git I have to peel my layer away and check what I have things aliased to.

It feels like terrible ad hoc user design built over an otherwise extremely elegant and clean data model.


Would you mind sharing your aliases? Curious to see a different mental model.


One of my favorite examples of this is the cache. I mean index. I mean staging area.

Sigh.


How else would you support breaking up a change into multiple commits?


I’ve been toying with various ways to do this. Some alternatives I’ve thought of:

- a “draft commit”, where you can amend a bunch of changes into a commit before finalizing it.

- default to “git commit --all”, and allow the user to “git commit --patch” where needed

- as 'anthomtb says, `git stash push --patch` (I also find myself wishing for `git stash pop --patch` so you can shove bits in and out of a stash as needed.)


Isn't a "draft commit" the same thing as the staging area?


The main differences I’m imagining are:

a) terminology: it’s one less concept to have to wrap your head around; and

b) a draft commit would also have a draft commit message, and (though I’m admittedly not sure about how well this part would work), draft parents (probably supporting refs rather than just commits as parents) so you can have multiple of them and shuffle them around conveniently. (This also sort of subsumes the stash as well.)

I made a preliminary stab at this a while back, though it has some awkwardness and I haven’t had a chance to revisit it recently: https://github.com/wolfgang42/git-draft/


By making multiple partial commits, then squashing stuff together as needed. Or amending the top commit. Or amending non-top commits (making them "absorb" the changes).

Which is what I do with mercurial (& evolve), and I am happy I don't have a super-special extra concept to clutter up my already overflowing brain.

The way I think of it, the staging area is an incrementally buildable commit that is not called a commit because commits aren't incrementally buildable. So if you allow commits to be incrementally buildable, then you don't need the staging area. The only difference is you need to come up with a message for the commit when you first start to build it. Or not—make it empty, then amend it when it becomes something worth naming.


The main thing in my book would be to call it "the staging area" from the start and stick with it.


The point of the comment is that all 3 terms refer to the same thing. `git add` modifies staging area, `git diff --cached` shows the diff of things in the staged area, and `git stash --keep-index` stashes things that haven't been staged (I think? IDK, I never use it). Maybe pick one?


Instead of git add --patch, there could be a --patch option to git commit. You can already edit the latest commit with git commit --amend, so you'd have to do git commit -p to get the commit started, and then continue with git commit --amend -p.


`git stash -p`


This. I stash what I don’t want to test and commit, because I can’t build or test the staging area.


> Why do you create a branch via the "git checkout" command?

  git checkout -b foo
is just a shortcut for

  git branch foo
  git checkout foo
> Why do you delete tags using "git tag -d" but delete stashes using "git stash drop"?

That is inconsistent. One has an interface of `git <thing> <options-to-manage-thing>` and the other `git <thing> <subcommand-actions-for-thing>`. I imagine what happened is the former was the original and was probably thought to be sufficient, but then it wasn't for `stash` and the latter was introduced for more flexibility. The inconsistency is probably from backwards compatibility.

It might be worth noting that, at least as far as I know, git was like the first to use or at least popularize subcommands. It'd be understandable if they didn't include support for sub-sub-commands from the get-go.

> If you want to blow away local uncommitted changes, you can use "git reset", "git reset --hard" or "git checkout (file)" - which (I think) all do totally different things.

git-reset is mainly about resetting the branch, index, and/or working tree to a given ref. git-checkout is mainly about checking out a ref, setting HEAD and syncing the working tree to it. They're different things with an overlap. I would say that's not really inconsistent. It would only appear so when one only learns specific patterns of commands for subsets of their function, like "blow away local uncommitted changes", which in this case fits in their overlap.


That can all be true, but it doesn't make git any less of a nightmare to learn. Even very experienced git users make mistakes and need to google things all the time. Its become almost a trope at multiple places I've worked that even once in awhile someone makes a mess of their git repository, and needs to call for help from one of the 2 people in the entire office who understand git enough to unbreak it.

Another annoying inconsistency: git tag prints a list of tags. Git branch prints a list of branches. Git commit prints ... modified files? And git stash modifies the stash. You need git stash list to see the stash. What!?

I get it; its a complex tool. Its managing 4 different storage areas for your code (the repository, the staging area, the index and the stash). It also manages tags, branches, remotes and configuration. And it has multiple networking interfaces.

But I can't escape the conclusion that its just not a very good user interface. A good interface wouldn't be so hard to use. Redis is more complex, but I don't make so many mistakes using the redis cli. Awk is more powerful - but its much more intuitive. And cargo probably has more subcommands than git does, but I don't get lost in them. Git? Git is a mess.


> It might be worth noting that, at least as far as I know, git was like the first to use or at least popularize subcommands.

In this domain it was popularized by CVS which merged the separate programs used for RCS.


Yes, I was mistaken. Even subcommands like `checkout` didn't originate with git.

https://www.gnu.org/software/trans-coord/manual/cvs/html_nod...


They tried to fix that mess, like with the recent git switch/restore. But it looks like it was too late.


Git wasn't the only DVCS with an elegant model. Mercurial, Darcs and Fossil came out around the same time. All are equally elegant in their own ways, and all have a much friendlier UI than Git.

So it is possible to have both an elegant implementation, and a friendly UI that doesn't force the user to understand the internals to work with the tool.

Git's elegant model is not why it won out. Despite of its shortcomings, I suspect the cult of personality around Linus had a big role in that, as well as major services like GitHub.


So what's the underlying model that makes the staging area make sense? Why does stash followed by unstash leave my checkout in a different state from what it was before?


The staging area is where you construct your next commit— giving you a middle ground between your changes in your local working copy and the last actual commit so that, if you don’t want to commit everything that you’ve changed in a single commit, you can do that.

(If you always want to commit everything you’ve changed, you can do that too— always commit with ‘git commit -a’ and only use ‘git add’ when dealing with new files that you want to add to version control.)


Hmm. With Mercurial I just use "commit --interactive" if I only want to commit part of my changes, and I always found that more intuitive and less confusing than having to mentally keep track of Git's staging area as well.


The git analogue to that would be `git commit --interactive`, or using `git status` to check the staging area while using `git add`. Keeping mental check of it is the worst solution imho.

You can also have your git porcelain handle it. Magit for example has a great interactive overview of unstaged and staged changes. When I need to do something more picky than just commiting every change, I'll usually grab magit to stash individual chunks: I don't necessarily want to commit all changes in a file, sometimes I want individual lines.

You can do that with staging using the commands above, magit, or some other porcelain (I've heard good things about git kraken). If you really want to forget staging even exists, you could just commit straight up and amend the commit afterwards to get a comparable experience I guess. I've found staging to be helpful in keeping track of what I've achieved for my next "version" of the software to be added to the history, which is why I'm still using it.


or `git add -p` to interactively stage changes


git itself has a pretty nice GUI for staging (and other things), `git gui`.


I know. None of that answers my question.


Staging is useful for gradually queuing up multi-file commits rather than listing them all in one command. It becomes even more useful with partial file commits.


What’s useful in it? A commit is a thing that should preferably make sense on its own, which can be guaranteed by testing or at least building/running the code. By cherry-picking changes from workdir into a commit don’t you basically make a blind guess? Or is it stash/test/pop every time? What if you overpicked? Reset and repeat?


I don't know about you, but I often get sidetracked with different changes when I'm working on something, so that the work directory is in a messy state to commit everything. The staging area allows me to cherry-pick only the changes that will be in the next commit, while keeping the rest for later. This way you can save the state momentarily, finish polishing the changes, and then easily commit them. I find it very useful to keep focus on what I'm currently working on, without the overhead of WIP commits.

> By cherry-picking changes from workdir into a commit don’t you basically make a blind guess?

No, you use the interactive mode (`git add -p`) to select exactly what you want.

If you overpicked, you can reset a single file, and try again. That can be a bit annoying if there are a lot of changes, so this is another reason to keep commits small and atomic.


I think the answer to the question is "yes, the person staging a partial commit may be making a guess". I think this is because the tool apes an earlier practice of crafting patches to share with other developers. There are definitely cowboys writing patches to show others, and not necessarily testing every implied snapshot in a chain of such patches. Some CI practices also encourage cowboy commits, i.e. if a team pushes commits to get them tested rather than testing prior to commit.

You can imagine an inverted perspective where the stash should be the only non-staging area, and the working copy _is_ the staging area for the next commit. Stash away partial changes you want to defer, then test the current working copy, then commit the working copy.

You'd also want status/diff commands that let you more easily compare: working vs HEAD (what can be committed); stash vs HEAD (all uncommitted changes); and stash vs working (deferred changes).


Your point is actually great, but it’s important that you should always test from a commit itself - that if those tests pass, gets merged to the release branch. If you are only testing your working directory I feel like that’s even harder to do.


I know, but that's not what I was asking.


The staging area is a virtual snapshot, in roughly the way that the working copy and a commit are actual snapshots. It's defined in terms of the current HEAD with some changes.

Not sure what you mean by "unstash", since "git unstash" is not a command (on my machine anyway, so not unless it was added very recently). I'm pretty sure stashes are still modeled as commits/snapshots.

The git stash command is a little wonky, yes, but I don't think that's a data model thing. It's easy to mistake the disaster zone of Git's CLI for problems with the data model. It becomes more obvious where the problem is when you start thinking in terms of the data model, and trying to figure out what incantation will perform the relatively simple operation in your mind.


> Not sure what you mean by "unstash", since "git unstash" is not a command (on my machine anyway, so not unless it was added very recently).

I meant pop or apply.

> The git stash command is a little wonky, yes, but I don't think that's a data model thing. It's easy to mistake the disaster zone of Git's CLI for problems with the data model. It becomes more obvious where the problem is when you start thinking in terms of the data model, and trying to figure out what incantation will perform the relatively simple operation in your mind.

I disagree. I think the staging area and its behaviour are inherently unreasonable; certainly all the "it's just a DAG of commits" people tend to be confidently wrong about what the staging area will do under a given sequence of operations.


I use the heck out of fossil and have little idea how it works underneath. I consider fossil the gold standard for a good versioning system ui. I don't think git has any excuse for being as weird as it is.

fossil init repo; make a new fossil repository

fossil open repo; open a fossil repository somewhere

fossil add file; note: very different than git add. in fact I was very confused by git add, in fossil the repository knows what files are managed by it and there is no staging area. so you only use "fossil add" when adding a new file. If you move a file use "fossil mv" to let fossil know what you did. Along with "rm" when you remove a file. The staging area still feels like an unnecessary added bit of friction.

fossil commit

There are others I use often "merge", "sync", "revert" but they tend do what the command appears to do. Speaking of revert, the git equivalent is really strange , even among the rest of the strange git ui. Shit! I messed something up and want to revert back to the last committed change, easy, just run the command "git reset --hard HEAD"


The bad thing is that fossil allows the upstream to make destructive changes to client's copy of code. That's a big no no.


So does git, in fact, without a better explanation of what exactly you mean, I would say upstream making changes to the the clients code is the core useful property of a version control system.


Until you need it. :)


How do you choose which changes go into a commit and which don't without an equivalent of a staging area?


I don't use git a lot, so I am not all that familiar with it's work flow. But I think fossil assumes a change will go into the commit so you use the stash command to hide items you don't want in. while git assumes a change does not go in a commit so you use add on files that you want in.

A bigger question is what to do when you only want specific chunks from the diff in your commit. I tend to faff around with stash, interactive sdiff and hand editing the patch when I need to dissect a chunk, a situation I feel could be better.


Emacs/Magit can do that sort of things easily. You can mark a region in a diff and add that to staging if you want.

Magit is probably the best chrome for Git I personally think.


git init .

git add file

git commit -am "my message"

Pretty simple really. You don't need to think about staging if you don't want to.


Is that a typo or intentional? If you did git add, you don't want commit -a, right?


"git add" is used in this context to add an untracked file (like the fossil example).

git init .

vi file (initial content)

git add file

git commit -am "first commit"

vi file (make changes)

git commit -am "second commit"

Basically if you use commit -am, you never have to worry about staging - which is most of the time imo. In the rare case where you want to avoid committing something that has changed use git add to stage individual files.


I believe these were supposed to map to what its parent comment has listed.


We do, at least to my mind. It's called Mercurial. It's great, extremely close to git, but is much easier to use IMO. It's really a damn shame that git won - primarily, I think, due to the cachet of its author.


Either way, is Git actually that hard to use? You can learn about 10 commands and develop any software just with those. I have no illusions about it's gnarly aspects but also just don't find it particularly difficult. Contrast a programming language that might have footguns in just printing strings.


Git is a pain to provide support for if you’re an internal tools team, lots of users with incorrect mental models


Close. It won because of Github. Git was gaining over SVN slowly but it was Github that really propelled it into widespread use.


The way I remember it git one for two reasons: 1. hg was (maybe still is?) much slower 2. “No technology can ever be too arcane or complicated for the black t-shirt crowd” (fake Linus’s words not mine)



Version control feels like it does require some complexity though. I think we all like to imagine that all it does is stores changes per file, wrapped in commits.

But when you add requirements like merging other people’s work with yours, movable “tags” to mark named versions, and the sort of cut/splice/move around operations you will always need because you accidentally did something you shouldn’t have… I think you end up rebuilding most of Git’s plumbing.


Video codecs are also complex (actually, they are almost unimaginably complex). But I've never seen a smart TV telling the user something about "lapped transforms" or "chroma subsampling".

I'm a user of Git. Why do I have to learn about implementation complexities in order to fix problems that arise from normal version control operations?

I can't think of any other software that forces me to do this. Even compilers (at least those for mainstream programming languages) don't require me to understand their AST representation or other details of how they work internally.


When it comes to tools you use every day, everyone should be critical. Does this tool decrease my workload? or do I need to expend extra work just to operate it?

I'm a user of mercurial. I've learned enough git to know it's an inferior tool. git's CLI complexity (and mental model) is patent overkill for 99.9% of all users.

Mercurial gets out of my way.

I'm able to clone and push git repos with it just fine (thanks hg-git!)

This is because both (git,hg) are tools that manipulate the same simple data structure: the DAG.

hg CLI's verbs match my mental model from decades of use of other VCS's. I'm able to perform my daily tasks with simplicity (including n-way merge/cherry-pick tasks that git "expert" colleagues often struggle with).

My 2¢.

I've used local (SCCS, RCS), client-server (CVS, subversion, perforce) and distributed (bazaar, git, hg)


WildBit had a really cool project a while ago called Conveyor. [0] It’s a shame it never took off.

It was built on git, but hid the complexity of branching + other git things behind “tasks”. You’d start a task, and silently push that branch out to everyone. It’d silently merge things in the background and handle a lot of the chores you have to do with git rebasing and such to keep branches mergeable.

It failed miserably with us because it perpetually created impossible to solve git issues. Someone accidentally removes a gitignore file and commits a config file with a password? Your SOL. It will keep coming back because it will exist on at least one other person’s device which will get force pushed back to the repo.

The weird plumbing exists because version control is hard, and prone to humans throwing wrenches into its nominally perfect system.

The real reason you don’t remember every magic git incantation is because you normally only need a specific one, once a year. But it has to be there!

[0] https://web.archive.org/web/20190226201600/https://conveyor....


TVs are consuming devices. If you are only wishing to consume git, you need to know "git clone" and "git checkout", and that's it, no need for internals. Or manybe not even that, you need to know where "download" link on Github interface is.

If you are encoding video, you often need to know about chroma subsampling, and colorspaces, and fractional framerate and all the other absurd technical details. It is actually much worse than git.

You can avoid those technical details if you use high-level software, only stay on happy path, and avoid any complex operations. You'll take longer and produce worse quality output than if you had fully mastered the software -- but often this is OK.

This is true both for video encoding and for git.


I have a gsync display and a cool graphics card. But to achieve the smoothest experience I had to search through reddit, read what are literally research papers and do the following: turn on gsync in a driver (obvious), turn off in-game vsync, turn on in-driver vsync (it does something!), set RTSS frame limiter to display’s MRR-3, and set up a schedule that frees “standby list” that also cause stutters. The reason is, many games don’t know about gsync and in-game “vsync on” messes with their input lag. But if a framerate hits MRR with gsync on, the driver falls back to vsync to prevent tearing, and so stutter. So frame limiter is required (idk how it works). I may misremember some details, but I’ve tested various settings, tried to deviate from the suggested path and realized that these articles were right about everything they say.

How does this story relate to git? Nvidia could do this research itself and hide the complexity behind a simple switch. If a user turns on gsync, then make in-game vsync a noop or advise them to turn it off, do the shit that in-driver “vsync on” does, frame limit itself to MRR-3 and empty standby lists periodically while the game is running. Pretty sure git could do a similar thing for its users.

And yes, most “consumer grade” players experience their adaptive sync technology to maybe about 30%. It’s still an improvement compared to vsync, ofc.


Git is a fundamentally different kind of tool than a video codec, and I don't think comparing them is particularly helpful.

I agree that Git could be a bit more internally consistent, and have a few more convenience shortcuts for very basic usage. But I can also see a strong argument that, as part of the Linux ecosystem, that's a perfectly good opportunity for someone who wants to build a wrapper around Git (and I believe many have).

It seems to me that unless you really only want to use Git in the most basic way possible (add all your changes every commit, never roll anything back, single branch, no stashes), understanding something of Git's internal model is less like understanding how the compiler's internals work, and more like understanding the fact that a C program needs to be compiled into object code and potentially linked into an executable before it can be run.


Televisions absolutely do this for power users. Or give them terrible image quality for regular users.


My view on it is that unfortunately there is no way to do that. Some things are going to be hard and that is it, you cannot make it easier. We see that in never ending stream of frameworks or "new/better/easier" ways which in the end become bloated in the same ways as most other stuff because problems don't go away if you don't know about them.

In the end data model of git is its killer feature because it gives me tools to deal with hard problems and do it efficiently.


I always feel that complaints like the one you’re replying to aren’t thought through. The simple follow up is: how would you do it better? There’s no better way. You only need to look at the complicated part of git for very complicated things. If you think about it, commits and branches are incredibly simple relative to what’s actually happening (when it comes to working with them, that is). Listening to old folks talk about pre-git version control gives me appreciation for just how good it is.


There are a few fixes I would make, but only a handful would be significant to everyday usage, none would involve fundamentally changing the underlying data model. The biggest would probably be to make all commits required to be made onto branches and either embed the data of what branch created a commit into the commit itself, or have a separate data store of branch-to-commit info, which would allow for "branches as a series of commits" to be a first class citizen besides "branches as references to a commit". Maybe add git notes-esque support for editable metadata on the branch, where you could put information that currently would be stored in some pull request management system.


The data regarding what branch a commit was made is available. It’s all available if you dig deep enough.

What you’re talking about sounds like the work of a git visualization tool made for fixing problems. I do love a repo viz where I can see the tree of branches for helping me understand where something went wrong. The surface level stuff people use day to day is as simple as it needs to be.


Really? Where?

To my understanding, in a scenario like

    A--B--C--D <-master
     \
      E--F--G <- branch1
             \
              H--I <-- branch2
You can't identify the set of commits I and H (specifically, commits made under a given branch) without knowing (external to the information stored in commits normally) that branch2 branched off of branch1.


In short: I don’t know where, but I do know that it’s in there.

My logic is that I’ve used tools that viz what you’re talking about. They actually have much prettier versions of the (admittedly very nice, thank you) drawing you made. Bitbucket has one. These viz would be impossible to make without the info.

I read the git manual a couple of years ago. (All except the plumbing chapters.) There are commands that unveil deeper and deeper into whatever you’re querying, and it was in there somewhere. Give it a look. It’s a very nice manual. I read the whole thing (minus the plumbing) in less than 4 hours.

If I had to guess, I would say git reflog is involved.


My example was actually flawed. You can generate that information by iterating through all the heads, and then working backwards to the first commit that is reachable by another head. The better question, that Git actually can't answer, is whether E, F, and G were made under branch1 or branch2. Reflog could answer the question, but reflog is local only, and at least in theory temporary.


Why do you feel you need to know these details?

I used git actively without issues for years before I took the time to learn how it works under the hood. While understanding the details was fun and helpful, I can't say it really changed much about how I use the tool.


There's something funny to me about the idea that the user-level Git commands are called the "porcelain," as if they were a toilet. https://stackoverflow.com/a/6976506

unrelated: I wish people would stop insisting on using full-width columns to display their blog content because it is nearly impossible to read on a 27" monitor.

Wrap that blog content with a

    <div style='max-width: 65ch'>


> I wish people would stop insisting on using full-width columns to display their blog content because it is nearly impossible to read on a 27" monitor.

There's a noticeable portion of tech people who seem to believe that unused screen space is somehow wasted. No margins + no line breaks is the gold standard to them.

I don't get it, it's unreadable to me.


Dan Luu is famous for this.

I can barely read the page it's so harsh.

eg. https://danluu.com/futurist-predictions/


You don't have to run your browser maximized you know


Please tell me about how you will replace me with a very small shell script next.


Telling someone to change their workflow because a website refuses to deal with larger screen sizes is silly.


The whole point of HTML is to let the client display it the way it wants to. Don't blame the server when you tell the browser to be as wide as possible.


The idea that everyone wraps semantic content in their own UI is ludicrous and does not bear out in the real world. I didn't tell the browser to do anything, it's the website owner's job to ensure their content is legible for small devices up to large devices.

source. Guy who has maintained/maintains many websites large and small.


I think that we have seen that in the real world, not every website is going to support your particular workflow.

So you can be practical, and apply the very simple workaround, or you can continue to tell the void how wrong those people are.

Personally I think that your expectations put unreasonable burdens on websites which may not be run by large businesses with big budgets. Niche forums and blogs run by regular people as a labor of love shouldn't be expected to have to devote a lot of resources to be accepted on the web.


Literally as easy as:

https://github.com/dbohdan/classless-css

And before you say I should do that myself, again, if you want your work to be comfortable to read for the world, the bare minimum involves legibility.


And will that be future-proof? How do you know?


A completely separate argument (and yes btw, as long as html is supported, these solutions will work.


Citation needed.


If there's a UI, or even a CLI, that properly matches the conceptual model of development with git, I'm not aware of it. I'm sure there's a much clearer user-centric model of work struggling to get out from underneath the implementation details. Over the years of working with git I've gradually learned bits of the plumbing underneath the porcelain, but reluctantly.


That's why we use fossil [0]. It just gets out of the way. It has its own server, wiki, ticket system and now, even chat!

The main complaint I see here is that it doesn't have rebase, and my point is the goal of the code is for the product to work, not to have a beautiful commit graph.

I've used CVS, SVN, Git. But nothing comes close to the usability that fossil provides.

[0]: https://www.fossil-scm.org/


Beatiful commit graph allows to actually understand commit history. I dread opening history with merges. It's always incomprehensible mess for me.


All abstractions leak.

You cannot work your jpeg photo collection but opening, editing, saving, editing, saving, editing, saving and then blame your tools when your jpegs are blocky and ugly.

First you have to know some basics about lossy compressing and destructive editing. Then, you can understand what steps you really want to take.

It's the same with version control systems. With git, first you have to understand what a commit and a branch is. Then, you can work.


"refs" are not behind-the-scenes things, they're the things you want to operate on. It wouldn't be like having to learn JPEG superblocks, it's more like learning what a pixel is.

You need to understand Git's data model, not its plumbing. No VCS will be useful for anything that isn't trivial if you don't understand its data model.


I disagree. We shouldn't have to know how a branch or commit is stored. Whether git interally makes a new ref, copies everything to a new directory, or does some hashing magic doesn't matter to the user.

I agree that we end up needing to know about refs because of git's user interface, but I consider that a flaw or limitation.


It's not about "how a branch is stored". We're not talking about objects, packs, blobs or any other stuff I never actually had to learn about while using git. We're talking about refs - the primary way for users to reference commits and other things in git. Just about every command you use takes refs as arguments. It's essential to even just imagine the state of the repository in your head. It's what the first tutorial you read about git should explain to you.


People mention previous gen version control systems in sibling comments, but git quite clearly “won” against them. A more useful outlook would be looking for what the next gen VSC might be, which would be Pijul and alia (where diffs/patches are applicable in any order, so merges will be trivial, no more merge strategies), and if they do get production ready they might really provide a better UX/more intuitive mental model for the programmer.


It's version control for software developers, who simultaneously need all the power it provides, and ought to have the educational background to understand directed acyclic graphs. Complaining that git is too complicated because it requires basic knowledge of DAGs is like complaining that your word processor is too complicated because it assumes a proficiency in written language.


You've clearly never tried to save raw image files from a mid level camera. :) heaven help you if you try a newer camera than your os.

I will make no major defense of git, but I am intrigued at the level of difficulty reported about it, compared to the obviously higher level of use that it gets.


What kind of cameras and difficulties are you talking about? It’s almost always just a DCIM folder on the sd card.

Videos, OTOH, are stored rather arcanely, at least on my Sony, with separate directory structure for each format.


Newer cameras have had raw formats that were not supported right away. Cr3 being the one that bit me recently.

I think most software has caught up. But it did catch me off guard that my machine needed updating to support this. I also think flatpak had some trouble with it. Can't remember details.


Ah, you're talking about actually using them, not just saving :)


Little of both. Took me a few tries to get them recognized as pictures. Rather frustrating experience, all told.


A "branch" is two things. In git, it's the ref. To humans, it's the ref or the chain of commits under it.

Both of these have their place, I feel.

When thinking about commits to a branch, humans (I speculate) tend to imagine the branch as a sequence of commits.

But when operating on the branch ref itself, like when you delete it, you should be very clear that you're not deleting commits.


Branch is ref. Chain of commits is distance between two commits/refs/their first common commit (branch point). If people working on the repo branch off main branch 99% of time, its meaning is quite straight forward. If they branch of a branch, it's also obvious what it is. As branching point is always defined so is equivalence between those two.

What's the point in arguing about it? It's a bit like arguing if `Line = { a: Point, b: Point }` is a line or two points.


There is no need for a "branch point" you can create multiple "branches" that are just references to different points on the same linear sequence of commits.


Yes, it's basically labeled commit. But when you talk about what it means, you do it in relation to some other labeled commit which will always have common branching point. Edge cases may mean it's empty content (disconnected branches) or commit itself is branching point (you're not behind, just purely ahead) or your branching point is your immediate parent (commit delta itself).

It's a DAG with optionally labelled vertices (content is cyclic of course, commit history is acyclic).

It's all potato potato.


You can have branches that are and have always been entirely disconnected from one another, for that matter. With completely different content, but still in the same repository.


Quite rare in practice though. Only case I have had that is gh-pages branch in a GitHub project repository, which contained only generated artifacts (generated by code in master). Which I believe there are better ways to do now. At least with Gitlab one instead uses a "pages" CI job, triggered from master, and the artifact it produces is the contents that will be hosted.


pristine-tar branches used for Debian packaging are another common example.


Yes, their branching point is initial empty content.


> A "branch" is two things. In git, it's the ref. To humans, it's the ref or the chain of commits under it.

Isn't that true of a git tag too?


A head ref is a kind of ref that, when you ‘attach’ to one as the current HEAD, has useful semantics for managing branches.

A tag is another kind of ref that.. doesn’t have those semantics.

When you fetch a remote and it disagrees about a head ref, you need to do some sort of merge.

When you fetch a remote and it disagrees about a tag ref? The remote wins.

Because one of the things git is trying to do is help you manage source code. Which means that it has to help you manage branching sequences of changes in a commit DAG.

Git doesn’t just have an arbitrary set of ref semantics chosen at random - it has head refs which behave in a very particular way to help with branching, and it has tag refs which behave in a way that is useful for versioning.

So that’s why I think ‘a branch is just a ref’ is a reductive take.

Git has a thing called refs that it uses for various purposes. Git provides tools for naming and working with named branches based on creating head refs. As far as many parts of git (that deal with arbitrary refs, whether they be heads or tags or remotes or whatever) go, sure: they can work with (the heads of) branches because branches are named using head refs.

But as far as humans wanting to do things with branches are concerned, ‘branches are just refs’ isn’t helpful they don’t want to do a thing to a ref, they want to do a thing to a branch of the commit DAG, so they need to know ‘how do I get git to do this thing with the commits that are part of this branch?’, and saying ‘a branch is just a ref’ doesn’t answer that question.


This post is confusing a few things. Branches and tags are nothing more than prescribed semantics for certain refs by default. You could treat a tag as a branch and vice versa if you really wanted. They're just refs. The tool has semantics around managing refs that map to branches and tags, but there's no hard rules here.

You're welcome to create your own configuration of refs and define how remotes handle refs. You don't need to use branches or tags, you can treat every ref as a tag if you like.

Branches and tags really are just refs. Any ref can be a branch or a tag. The refs don't behave in any way. They're just refs. The semantics are not frozen.

Take a look at git notes, or gerrit review refs, or GitHub pull request refs, or... any number of tools which build on the ref system.

git is surprisingly flexible, but ships with sane defaults. They're not gospel.


Thanks for taking the time to explain this. The idea that a lot of these names are just "prescribed semantics" really makes understanding git easier. I think it also drives home the point of how powerful, "simple" and thought out git is.


‘A branch is just a ref with some sane defaults’ is saying a strictly different thing than ‘a branch is just a ref’.

Yes, you can do anything you can do to a branch tag to any other tag (or directly to any commit hash).

Which means you can do things like ‘merge’ any of those things. Not just refs. So things that are not refs can act like branches.

And you can’t just treat any ref as a branch. Is a remote a branch? No - so things that are refs can also not act like branches.

X is Y can’t be true if there are instances of X that are not Y and instances of Y that are not X.


> It's true that Git implements branches as refs, plus also a nebulous implicit part that varies from command to command. But that's an unfortunate implementation detail, not something we should be committed to.

I actually find git way easier to understand if I don't think about branches in the way the author suggests. My mental model for git really is commits and refs, and it helps me use it fluently. When I wave my hands around and say "a branch is nothing but a ref" what I actually mean is "...and so you should understand how it actually works, so what I'm about to do doesn't look like magic".


I think you and the author are right, but addressing different audiences.

I think most people come to git with their own mental model of what a branch is, what a merge is, etc.

Learning git is often mostly undoing their preconceptions (ie by saying 'a branch is just a ref').

But ultimately, we humans tend to think of a branch as a branch, not a ref. For instance, the previous sentence probably made perfect sense to you.


I agree, and would add that this is exactly what the OP is saying when it talks about this being a communication issue. Saying "a branch is just a ref" is the right attitude for talking to git and making it do what you want it to do, but it's the wrong attitude when you are writing or talking about code, in which case branches are the reification of the processes by which your code evolves.


Do they? I certainly had no intuitive notion of how version control would work before I learned git for the first time.


You did have an intuitive notion of what the word "branch" means, though.


Yep, I was taught that at first place. People who taught me about git describe branch as a marker on commit. And I never has confusion about it.

Probably people confuse about this has experiments with some other version control system. So they already has some specific meaning attached to the name "branch" ?


It reads a bit like the author has a misconception of what commmits are. A commit is not a diff, it's a snapshot. All of the talk about branches containing changes etc, make sense in the context of it being a reference to the latest snapshot. And yes it might not make sense if you think of a branch as a reference to the latest difference, but this is a misconception.


The author has written more about git (see the link at the bottom) than there is in the git manual. Do they… do they know there’s a manual?


When you say "a snapshot", are you meaning "a snapshot of the complete file/folder tree at the point in time of the commit"?

While that's true from a mental perspective, the on-disk format of most standard commits is indeed a diff.

eg what changed from the previous commit, as output by the diff command


I assume you’re referring to packfiles[1], but those are (a) very much an implementation detail (they exist only to save disk space and appear basically nowhere in the git UI), and (b) not diffs—git will pack together whatever blobs its heuristics[2] think “look similar” with no regard for whether they’re actually related to each other, as long as they seem likely to gzip well together.

[1] https://git-scm.com/book/en/v2/Git-Internals-Packfiles

[2] https://github.com/git/git/blob/master/Documentation/technic...


While it’s true that the packed format does only store some information - it is never a diff file and always pointers to trees and blobs.


Came here to say this. Commits on a branch should be as easy as "save file". You shouldn't put commit messages on them other than something like "snapshot 3". Nobody cares about the commits on the branch.


That sounds weird to me, as commit messages in a (side?) branch are just as useful as commits on the main development branch. eg very, as they describe what that change did

Maybe the only time / scenario I'd agree with you though, is when I'm just creating a commit to capture a temporary development state (eg WIP) on the way to some development objective. That's not very often though.


For certain changes, a clean branch history can be very valuable during review. For example, if you had some feature which was rolled back due to bugs, a new branch with the feature plus the fixes should have the original (buggy) feature as the first commit(s) with no other modifications so that people can clearly see what was done to fix the original bugs and what was in the original code.

However, you can fix things up when ready for review by just rebasing things into a coherent sequence of commits. I really wish there was a good autocommit/push feature in Git that would help back things up but continuously rebase and compact old autocommits.


It's true. Making a very clean git commit history is key to making the most use of git. That's partly in choosing sizes of commits (does one thing, not two things), partly in choosing good commit messages, and partly in branching strategy.

Personally I try to avoid merges at all costs. Instead, I always try to keep feature branches cleanly rebased off master. This sucks with github, so I have a tendency to destroy and recreate feature branches to avoid getting merge commits mixed up in the remote. GitHub is dumb like that. I don't really know how to fix that except to suggest that some branches on remote should be auto-rebased if it can be done cleanly. But it's still a pain.


OP argues that branches aren’t “just refs”, they’re a sequence of commits. I’d argue that a ref _is_ a sequence of commits. (Or, to be pedantic, it uniquely identifies a sequence of commits.)


A commit contains a sequence of commits, in fact (unless it is the root commit). So a ref is just a single commit, which then contains parent commits. The branch model really falls apart if you ever do a more complex merge or rebase.


I feel like I have a bigger problem with "a sequence of commits" as a phrase than I do with whatever he wishes to call a branch. A commit itself is a sequence (it includes its own history!), so a sequence of a sequence is a confusing thing to talk about without any sort of qualification, since they're heavily constrained by being linked together. Moreover, what he's calling a sequence is really a DAG; it's not linear. Hearing "a sequence of commits" is so jarring that it conveys more confusion about git than anything else... perhaps that's why he gets corrected so often?

But in terms of what a branch is, if you don't want to call it a "ref", then I'd just say it's a commit symlink?


Hmm, I think you'd be going a bit far by calling each commit a sequence. A commit does not contain its parent commits: it only contains pointers to its parent commits. By that logic, a linked list isn't a sequence of nodes, since each node is itself the head of a sublist.


The pointer isn't really relevant to what I'm sharing. I'm talking about what it is logically, not how it might be represented physically. You could certainly make each commit contain the other one physically if you wanted to represent it that way in some programming language. Of course there's not much of a reason to, and it'd be annoying since they'd be different sizes, but it's not like your have infinite recursion or something preventing you from doing that.

What I meant is something else though. A linked list is a linked sequence of nodes (or a chain of nodes, if you wish to call it that). A branch is a linked DAG of nodes, if you wish to call it that. Calling a branch a sequence of commits feels like calling your extended family (or I guess your genealogy) "a sequence of people" - it misses some crucial aspects of the structure and just sounds very confused.


This article begs for git manual awareness. I’m fairly certain the author doesn’t know the manual exists. It’s a short manual. The chapters through branching should be required 1.5 hour reading. It’s stupidly simple when it’s broken down correctly. (The plumbing chapters are admittedly over my head.)


I tend to think of branches in git the same way I think about strings in C or lists in (Common) Lisp.

In each of these cases, there's a simpler underlying data structure that's directly overlaid with a set of conventions. It's the conventions on the use of the simpler structure that gives the illusion of some higher order data type.

(The distinction I'm making between these and abstraction in general is that here the abstraction is almost intentionally somewhat leaky - presenting both strengths and weaknesses.)


This is why there is a hope to build a new VCS that isn't that state-dependent and based on isolated patches - Pijul[1].

[1] https://pijul.org


It's a shame Mercurial became a betamax since it's much more user friendly.


How does Mercurial differ in this way? I always figured mercurial was just a 'better UI', but using similar internals.


Much like Betamax, the technically superior option often fails in the market.


> None of these can be understood if you think that a branch is nothing but a ref.

To the contrary - all of these can have "branch" changed to "ref" and they still make perfect sense. "ref" is just as much a sequence of commits as "branch" is. Every commit is.


I think of a git repo as a gigantic bramble bush, with some of the brambles joining back into each other. There are some labels attached to the bramble bush in various places. Some of the labels follow the tips as they grow, some stay fixed.

It gets interesting when you compare two bramble bushes with the same provenance, see how they differ, and then how they can be reconciled.


I think you can be technically accurate ('branch is a named ref') and still have the same mental model of branches described.

A commit, specifically its ref, is already encoding the 'sequence of commits' that led to it - viz. it knows its parent(s).

A branch is a named pointer to one of those then, and 'nothing more', but that is already a sequence a commits.


It seems to me that "a branch is just a ref" is important knowledge to anyone developing git software or any other software that interacts directly with the files in a git repo.

For pretty much every other developer who's just using git for source control, "a branch is a named sequence of commits" is a much more useful way to understand them. I don't want to think about the implementation details of how git internally represents a branch any more than I want to think about whatever's going on inside a word .doc when I add a table into my document. The important thing is my branch (or table) works the way I and everybody else expects it to work.

(It's good to have at least one person on your team who understands both those concepts deeply, as the obligatory xkcd explained...)


What's the distinction between '[interacting] with the files in a git repo' and 'just using git for source control'?

To be clear I didn't say a branch isn't a named sequence of commits! I said it's a named ref to what is already a sequence of commits.

I think people should understand their tools - not the internals/implementation detail (necessarily), just how to use them effectively.

Today I had some post-incident reverting to do, and it was complicated by hairy merges, poor commit messages ('what' not why/the context), poor commit stucture & history, etc. And I'm sick of CI pipelines called 'Merge branch master of <remote url>' on the master branch (from doing merge-pulls of origin/master having committed to master locally - `git config pull.rebase true` if you're going to do that) - tells you absolutely nothing about the change that's actually building (because it's parent 1, not the commit itself or the merged parent 2) and causes a snaking history on master (flipping parents 1 & 2 every time someone does it).

It's like a builder/'handiman' using a combi-drill day in day out in drill mode, using it for masonry & screws too, but not caring to realise the hammer & driver modes exist.


> and 'nothing more'

Also it "follows along" when you commit to the branch. Named pointers that stay put are tags.


Good point, that is a bit of a quirk to the model. (Bloody useful, wouldn't have it another way obviously. But it is a special treatment more than 'a name for a commit'.)

I suppose I can recover by saying 'committing creates a new commit, with HEAD as parent, and updates the checked out pointer to point at it'. Where no branch, a 'detached HEAD', is perhaps more thr special case - it's a sort of nameless transparent pointer that updates but you're only aware of the commit ref itself. Although again tag is also a special case in that it doesn't update, as you say. Really they're just all different ways of referring to a commit, doesn't really make sense to call any one of them the true way and the others special cases.

I stand by the commit being the 'sequence of commits' though. So tldr, branches, tags, detached heads are just different mechanisms for referring to such a sequence: named and updates, named and static, unnamed/transparent. But in each case, they point to a commit, and inherently the sequence behind it.


When teaching Git I often use a whiteboard with a permanent marker and Post-its for branches. It really brings home the point of the commits themselves being rigid and branches flexible.


That’s a great visual. I am on deck to give a mini git tutorial, and I am stealing this.


I'm pretty sure every person who says "branches are nothing but refs" understands why they are called branches.

They just want you (and others) to learn. This is an important piece of puzzle that many people miss.

Would you sacrifice all those (other people's) learning opportunities just so that you don't get annoyed?


Subversion still exists. So does Perforce. There may be others.

This mental model fits those models in the code. It also fits Git-Flow. Those are all authoritarian.

So is everything Linus produces.

But, the secret genius of Linus is that he creates things with branchier futures than most creations and then he lets them wander around, becoming even more aggressively future-branching. No, actually . . . the secret is what allows him to do that. It is at least two derivatives of "creates branchy things . . . go!"

Thus git. Thus Linux.


I think this highlights a deeper issue with how we as programmers tend to think about abstractions. It is easy to pretend, that you can conjure any interface/any abstraction from thin air as long as you define what you want well enough. But the reality is, that a good abstraction needs to be build using elegant constituent parts. It needs to be built in a way, which looks at the problem in an angle which simplifies it.

Building named sequences of source changes via commits which chain together, with the only thing that defines the identity of the chain being a ref to the end is such an elegant abstraction, which is not bad to expose; It allows for easy reasoning about source changes. Saying git branches are a seperate sequence of commits and keeping it at that would be not a good abstraction, even if you hid the implementation really well.


Describing a git branch as a named sequence of commits implies that it's a specific sequence of commits. But without a merge base, a branch is an ambiguous sequence of commits.

Ergo, a git branch is a named sequence of commits [when the implied merge base is obvious].


I'm struggling to follow why a merge base makes a difference? Or even exactly what one is?

A couple of questions which might help clear things up:

Do you consider `main` a branch?

If you create a new branch called `feature1`, do you consider the commits from before the branch from `main` to be "part of" that branch or not?

What if you delete `main`? Are the commits from before the branch part of the `feature1` branch then? What if there are multiple surviving feature branches that share varying amounts of common history?


main is a branch. By convention, it contains all of the commits back to an ancestor that has no parents.

Unless given more information, I would consider the commits between `git merge-base main feature1` (exclusive) and feature1 (inclusive) as part of the feature1 branch.

Now, if I `git checkout -b feature1-A feature1`, what commits are part of branch feature1-A? It depends. With respect to which merge base?


I dont understand what author is trying to say. I always understood branch as just a pointer on a linked list of commits. You can freely move around this pointer. Head is similarly another pointer to the current commit. This plus each commit is contains whole file which was changed in that commit.

Is author saying the same thing or is he saying that each commit also has some hard-coded reference to the branch in it too?


I think the author put it well. Branches are an abstraction (albeit a leaky one) over refs, and abstractions are useful. Tags are also just refs, but we (and the git client) assign them a different semantic meaning and functionality than branches. Using those names for different ref types creates shared meaning between people around the world. They allow further abstractions like git flow, which in turn allow (in a way) CI/CD or analysis software to build on those conventions.


I'd argue that there's a difference between a git branch (an implementation detail) and a development branch (a process feature) and that's the difference the author is running up against.

Are you teaching someone about development? Or teaching about git? Because a git branch IS a ref. And a git branch is useful because it points at a commit in a series of commits. Branches and commits are slightly different concepts depending on your VCS and it can be worth understanding the details.


A git branch is a text file containing a single sha1 value and a newline. Pretty sure the author can't tell me the named sequence of commits given that information.


A complex data structure is just a pointer. A file is just a pointer. My whole identity and personhood is just a pointer.


Yes and no. Especially once one considers how a remote branch is not a branch in the local sense.

If that struck as odd, consider reading this, which I wrote sometime ago when a number of pennies dropped: https://peter-whittaker.com/


More precisely, a remote branch is just like any other branch, it's just not the same branch as the local one. The origin/master vs local/master can be thought of identically to local/master vs local/my_fun_branch, plus some helper functions for remotely controlling the remote repo.


I want to argue... tags are that. Branches are usually ever changing and sometimes disappearing sequences of commits. ymmv (:

(And now don't tell me that also tags could change, and branches not.. then we are almost back to it is all refs - btw why so mad about that? It is another valid viewpoint imo).


Aren't git branches just linked lists in reverse? Since it's a reverse linked, you add to HEAD pointer instead of tail. And because it's reverse, you can branch out with any number of pointers (aka branches).


A ref is a pointer to a node in an directed acyclic graph, and a branch is a pointer to a ref. Not all subgraphs of directed acyclic graphs are sequences.

So, not all branches are sequences.


I am not saying a branch is nothing than a ref, I am saying there are only refs in git and this is the powerful part of it because being so arcane and simple makes it so flexible


My mental model is that a branch is a pointer to a commit, along with the other semantics of the git tooling that deal with branches. Specifically merge bases.


git branches are just a monoid in the category of endofunctors. Did I get that right?


Don’t understand why this blog is featured so often on this site.


I thought git branch is an alias to specific commit...


someone who wants to experience hell can try to work in the same job with someone who has the same style as this blogger.


A commit is also a sequence of commits.


Some people are really into Git.


..that grows from the leaf


[insert obligatory mercurial comment here]


It’s not helpful to say branches are only refs as a way of arguing branches don’t exist. But it is helpful in understanding how they work, what it means to pass a branch argument to git for some operation.

When I understood the reflog and how nothing I do is really gone (just gotta find it!), that was when I realized how much I like git.


> When I understood the reflog and how nothing I do is really gone (just gotta find it!), that was when I realized how much I like git.

The reflog won't help you recover things in the staging area that where accidentally `reset --hard` though … (you can still get the ones you added to the index with `git add` but not committed[1], but changes that weren't added are lost for good)

I love git, but the UX is still terrible …

[1]: https://stackoverflow.com/questions/7374069/undo-git-reset-h...


The reflog also won't help you recover things you manually delete before you `git add` them. Or that were outside of the git repo. Or that you written on a piece of paper that you threw away.

To someone who has a right mental model of git, all of these should be just as obvious. If you didn't stage your things, they were never in your repo.


> Or that were outside of the git repo. Or that you written on a piece of paper that you threw away.

Those aren't useful analogies as git cannot remove them in the first place. It's normal for a user to expect a tool to have an “undo” mechanism for its commands (with a prompt “this action is irreversible, do you want to proceed” for the rare actions where the action have to be destructive, like when running the git garbage collector manually)

I know exactly why git behaves like it does, but that doesn't make its footgun less of a nuisance. And it's all about the UX, there's zero technical reason that would prevent git from saving your work as a temporary commit before deleting it, in a way that would make it recoverable, just a lack of user empathy.


--hard is supposed to be your “this action is potentially lossy, do you want to proceed” acceptance. Same with --force. The problem is people teaching git, not git itself in this instance. Don't teach people to use git reset --hard, have them use named stashes. Easy recovery, same goal achieved.


> have them use named stashes. Easy recovery, same goal achieved.

It's absolutely not the same thing. If you just want to discard the modifications you have, then stashes are fine. But if you want to move a branch to another location, then you have to use reset --hard, and then when you have an unsuspected `git reset --hard` in your shell history, the shell auto-completion can screw you pretty quick.

That's the difference between a prompt and a cli option, the first one doesn't appear in your shell history, so you're never going to have it pre-filled by mistake.

> The problem is people teaching git,

When you have a recurring problem with people teaching “something”, then the said “something” has a bad UX.


Why do you love it?

It looks like you've described a tool with 1) a terrible UI, and 2) trains users to use commands that will eventually cause them to lose unsaved work.


1. git has a terrible UX (not UI), including destructive actions without warning and much more really cumbersome things for newbies to learn about

2. once you know it enough, git is a powerful tool that I really appreciate.

1. and 2. aren't contradictory. And I would love git even more if the UX wasn't the dumpster fire it is, but I happen to know enough of its bad UX to be able to do what I want with it.

Also, unlike the average git expert on HN, I still recognize that the UX is shit, and that you should need to spent as much time as I did in order to be able to use it at all. I'm really annoyed when people argue that a bad UX is in fact good because of some elitist reasoning.


I am sad that this article exists, let alone all this discussion. It feels like that sort of take that is put out there just to be contrarian and stir the pot.

I don't think I've ever seen someone say "a branch is a ref" with the intent to diminish the concept of a branch or the utility of it, or the special language around it's concept.

On the other hand, appreciating what git commits are, what tags are, what branches are, and understanding the "just refs" part of it, was vital to my understanding of git. This article feels like it argues against something that (1) doesn't exist, and (2) in the form in which it does exist, serves a vital role of exposing 'just enough' of git to help some folks understand the model enough to be effective with it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: