Hacker News new | past | comments | ask | show | jobs | submit login

The comments in this thread are very interestingly making the Author's point.

Yes, you can be "technically correct" saying branches are just refs, but it's not a useful statement for most users.

I believe the author makes a very valid point, and we could do with a bit less "technically correct" and more with language targeting the usage rather than technical implementation.

Git is confusing enough for many people, and we don't have to make it more confusing for them.




I wonder if this has more to do with how people visualise commits than branches per se. I think the git UI by default often encourages users to think of commits as diffs. A branch then is a bundle of diffs that, when stacked together, produces the current state of the repository. Internally, git uses some sort of optimisation to make calculating that current state quicker. In this mental model, it doesn't really make sense to talk about a pointer to a specific commit, because a commit without the rest of the information that makes up its branch is useless.

The problem is that this mental model isn't that useful in the first place, and often leads to confusion. Instead, it's usually easier to think of each commit as a complete snapshot of the entire codebase, that includes a link to the previous complete commit that it was made from (which in turn contains a link to the previous commit, and so on). In this scenario, a branch is just a pointer to a given commit - that's pretty much the easiest way to think about it - and the commit itself is a stack of history. Internally, git optimises for compression by removing redundant information between different snapshots.

In this mental model, thinking about branches as just a different type of tag is easier than thinking about it as a stack of commits, because each commit is already a stack of commits. Moreover, I think the snapshot model often ends up being clearer and easier to use overall. All you need is the basic concepts of snapshots, a linked list, and pointers, and the whole thing kind of just falls into place.


A ‘series of snapshots’ vs a ‘series of diffs’ are just duals of one another.

Since you can go back and forth between them at will, it seems odd to claim that one perspective is inherently superior. Like insisting that a chess board is actually a white board with 32 black squares on it.


It doesn't seem odd to claim that one of them is on average easier for beginners to use for intuition than the other, and that introducing both models simultaneously may bring more confusion than clarity.

I don't see claims of inherent superiority or correctness. It's about what's useful for education.

Nobody is arguing that you need to adopt a new mental model if what you have works for you.


There are some subtle differences in whether the canonical representation is a snapshot of the current state of the repo or a patch applied to the current state. One simple example is reordering; assuming your modifications don't change the same line, reordering patches arbitrarily won't change the final result. If you instead store snapshots of the state at each point, then reordering the snapshots won't necessarily result in the same final state, since you might have moved a different state to the final position.

You're correct that the two models are equivalent, but version control is about operations that you perform on the models, and those operations will not be the same for both models. You can reason about your git history as if its a series of patches, but git itself doesn't know how to deal with any model other than snapshots.


Given that git commits are immutable, reordering doesn’t matter. Any history rewriting involves creating new commits - whether those are new snapshots or new diffs.


The snapshot model is the correct model. How data is optimized via compression techniques is secondary. Thinking in terms of "diffs" is incorrect.


Targeting the usage is exactly what makes git confusing to people. You can start using git just by learning when to type "git add", "git commit", "git push" and "git pull" and you'll manage to collaborate somehow, but it will all fall apart the first time you stumble upon an unfamiliar situation. And because Git's UX isn't great, it's pretty hard to create a right mental model just by using it and inferring from the interface.

If you start by creating a mental model, the confusion goes away. Reminding people that "branch is just a ref" is just a way to push them towards less confusion.


Honestly, I think that if "git lol" was the default log command it would do the most to make things much more obvious to newcomers.

  git log --oneline --graph
And git lola for a gestalt of the repo's recent state:

  git log --oneline --graph --all


Just think about how useless and confusing the GitHub's history view is as soon as a merge is involved. Countless times I pulled something from there just to browse the graph because of how unhelpful the web UI is.


For how popular it is, GitHub really sucks: internet has simply miserable ways to visualize commits :/.


I'm still looking for a tool which produces a timeline similar to Fossil's, but for git.

Example Fossil graph: https://chiselapp.com/user/rkeene/repository/kitcreator/time...


I'm probably missing the details but doesn't most git GUI show a timeline like that? Such as the official desktop version and official (?) gitlens extension in VSCode? I don't use them myself though so I might be wrong.


gitup and other git clients for example do this: https://gitup.co/


I can't try gitup since it seems to require macOS -- ideally something web-based similar to Fossil.


If you do that ^ a lot then look at the `tig` tool which is that with an ncurses ui (and some more features)


VSCode users may enjoy the extension "Git Graph."


This, so much this. Lots of times where I've seemed like a git whiz, it's really just that I've got a marginally better understanding of how git really works. Git is much easier to use when you wrap your head around how commits and branches are internally represented.


I think there's some confusion around the meaning of "internally represented" seen in this thread. I wouldn't really call it "internal representation", as then people complain that a tool shouldn't make them learn its implementation details - and they're right, but that's not what happens here.

You don't have to learn how git-the-tool represents things internally. However, you absolutely should learn how git-the-model-of-a-repository represents things, because that's what you're operating on. Git is a tool to manipulate repositories, just like LibreOffice is a tool to manipulate documents. You don't need to learn how ODF stores things in zipped XMLs (just like you don't have to learn how git stores things in its content-addressable filesystem), but you need to understand what paragraphs, words, pages or slides are as this is the model you're working on (just like you need to understand what commits, branches and refs are and how they form a graph).

Unlike LibreOffice, git doesn't make it easy to understand its model just by using it (you could even say that it actively misguides you, although it has good reasons to do so), so you usually have to read some docs to grasp it.


I don't think git is unusual in that regard, and I think those complaints would be unjustified. Loads of tools work totally fine with only the barest understanding of how to use them, occasionally have problems that require a bit more understanding of their internal model, and even more rarely require deep knowledge of their internal model. I think most development teams would be totally fine with only a single member who has a slightly better understanding of the internal model. That knowledge only comes into play very rarely for me. If nobody were available with that knowledge, those teams could make do by simply copying the work into unmanaged text files for a few minutes and then just "manually" override the botched merge that got them into trouble.


Yes, I think it's a common case of getting the fact right, viz. branch == just a ref (true), but the understanding wrong, viz. ref == commit (false).

A commit is an immutable object. Whereas a ref is a pointer, literally a place on disk (a regular file) that holds an address (plaintext SHA) to the latest point in a logical chain of commits.

Meta remark:

This is also what makes it ok to delete a branch after it is "done", and why it is ok to merge a standard working branch (like "develop") repeatedly into a target branch (like release/master/main/trunk).

The semantic / meaning of a branch is transient. It is mutable conceptually and literally.

edit: formatting


Sort of an aside, I find it funny that the reflog is named that and not commitlog. With this mental model when you look at the reflog you usually want to get back an immutable commit because you've lost the ref. I know it displays the commits and the refs, but does anyone actually look at the reflog and checkout HEAD@{6} or do they use the commit sha?


reflog it a tool to show you the history of a given ref - if you don't give it any, it defaults to HEAD. It seems to me like "reflog" is the perfect name for it and I don't see how "commitlog" would be relevant to what it does.

Did you confuse "refs" (references) with "revs" (revisions)?


No, I was going based off the mental model I replied to:

> A commit is an immutable object. Whereas a ref is a pointer, literally a place on disk (a regular file) that holds an address (plaintext SHA) to the latest point in a logical chain of commits.

Reflog shows you the immutable commit SHA and the HEAD@{N} ref. I've only ever used it to get back to a commit I've lost, never by ref, so to me it's a commitlog.


HEAD is a ref just like any other. What you're looking at after typing `git reflog` is the history of things HEAD has pointed to - it's HEAD's log. Refs don't necessarily have to point to commits, they can point to other objects too.

HEAD@{<N>} is not a ref - it's a rev in <ref>@{<N>} form that means "N positions back in ref's history" (see `man gitrevisions` for more rev forms).

> never by ref

When you look at reflog's output, you've already dereferenced these commits by the given ref and its history.

Try `git reflog <branchname>`.


Yes, I've done that, because the reflog keeps more than just commits. It also keeps checkouts, merges, steps in rebasing, etc. So I've checked out a HEAD@ ref, when I made a mistake in merging or rebasing.


> Git is confusing enough for many people, and we don't have to make it more confusing for them.

Using terms with the wrong definition, and not precisely defining concepts, makes things more confusing, not less.


I'd argue git is confusing for many people because they don't understand the data model. The solution is to learn the basic data model instead of pretending that "a branch is a ref" is not true. Because it is true.


A branch is not a ref.

A head ref is a ref that names a branch. But branches can exist in git without refs. Branches are artifacts that exist in the commit DAG - they are dangling chains of commits that end without being merged in to some other commit. They exist, as pure platonic branches, even if they are un-referenced.

But then you can make a head ref and name one and now all of a sudden you have a named branch. As you make more commits that extend the branch while ‘attached’ to that head, the head ref follows the tip of the branch (that is in particular a thing a head ref does that a tag ref does not).

But you can add commits and extend a branch in a detached state of you like - no head refs following the branch tip. Yet the branch definitely exists. And then if you tag it, you name it.

So no, I don’t think “a branch is a ref” tells the whole story.


This is a strange take, in my opinion. Dangling commits like those you describe will be cleaned up by the the garbage collector. To say that a “branch” exists without a branch ref pointing to it is at best purely pedantic. Without a ref there is no meaningful branch because it will disappear eventually.


For anyone reading this who would like to learn about the data model, I highly recommend following along the "gitcore-tutorial" manpage. Like actually type the commands and play around with the results. Once you understand what's going on under the hood, the UI commands all make intuitive sense.


The author and people who insist on teaching (just) that "a branch is just a ref" fighting over the wrong point. The important part is to understand that each commit is itself both a complete snapshot of the repository and a sequence of commits that led to that snapshot (or more correctly, that it doesn't make sense to think of a commit without thinking about its pointer to the parent commit). That seems weird first, but everyone who understands how git works has internalized that, whether they explicitly think about it or not. After you understand that, it becomes easy to see that both the "technically correct" point and the author's point are kind of equivalent ways of saying the same thing.

But without this understanding, being told "branches are named sequences of commits" is probably worse than being told "a branch is nothing but a ref". The second one is cryptic and will soon be forgotten, no harm done. The first one leads you into a false sense of understanding, and soon you'll see an operation that looks like deep magic: someone moves the branch ref and now suddenly the whole branch is a completely different sequence of commits.

The confusion experienced by many people is largely due to the fact that a lot of articles try to teach git in a way that gives a false sense of understanding without explaining how git really works, which is exactly what teaching "Branches are named sequences of commits" does.


While I agree the title in a vacuum has the ability to mislead, the article itself is a critical piece, not a tutorial for beginners. I don’t think much harm was done here.


It's interesting how I strongly disagreed with you before reading the linked blog post, but I fully agree with you after reading it.

I still rather think of "git branches" as the technically correct "just refs" and hold the separate-but-related human-only concept of "development branch" in my head. I don't think there's a better approximation of truth than that, nor do I think it's that more complex to understand.

But the fact these two concepts share the same name truly is confusing. One would do better to refer to the former as "bookmark" or "branch tip".


I'd argue that in the long run, thinking of branches as the entirety of the history before a commit causes more confusion. I'd propose that the most useful way to convey the idea of branches to new git users is to start with the concept that every commit after the initial one has one predecessor, which means that you can always trace back the history of a commit by following the predecessors back to the initial commit, and then introduce the idea of a branch as a name that refers to a given commit. Combining these two ideas means that for any commit, you can definitively state whether or not it exists in the history of the commit that the name points to, and that commits that are part of that history are conceptually considered to be "in" the branch. Then you can introduce the idea that you can "update" the commit that a branch points to, and that the only way to add a new commit is to "increment" a branch to point to a new commit after the current one it points to.

This establishes enough information for you to show how using a git repo actually works; at any given time, you're looking at one specific commit, either directly or via a branch's name. If you're using a branch, then committing will perform the "increment" discussed earlier, with the branch now pointing to that new commit. Showing how to create a new branch will naturally lead to the discussion about how you can have two branches pointing to the same commit; this lets you explain that adding a new commit without specifying a branch name can be ambiguous, which you can demonstrate by checking out the current commit directly rather than by a branch name. Once you've shown that adding a commit requires either checking out one of the branches you have that point to that commit or creating a new one, you can show that the same principle holds for any other commit in the repo as well, even ones further back in the history with no branch currently pointing to them. You can use this opportunity to introduce the concept of `HEAD` as the unique name for whichever commit you're currently looking at, and that looking at a commit directly rather than via a branch is called having a "detached `HEAD`", which means that you won't be able to make any changes without creating a branch at that point first and "reattaching `HEAD`" to that new branch.

If you're trying to teach git to someone who hasn't yet learned the equivalent of an intro to data structures class in computer science, it might be worth simplifying the concept of branches in the way you describe. If you're teaching someone who already understands what a tree is, you're doing them a disservice by trying to hide the model from them because they have more than enough to understand what a branch actually is.


This was true back in Subversion too!

> Creating a branch is the same as creating a tag

> Tags merely exist to pinpoint a specific repository revision


It seems to me that people are confusing the CM concept of a branch with the way that git has chosen to implement it.


There is no single CM concept of a branch.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: