Thanks for the post! I should mention that I'm not a git expert by any stretch, just a git user who's recently been herding some folks at work from perforce to git (so have brushed up on git explanations/internals).
It seems like the conflict is that there actually are some separate things going on "under the hood", and you're not satisfied with the way that git has combined them? To be explicit, some distinct steps we're discussing are:
1) Update the working tree to match the tree-ish
2) Create a new branch
3) Update HEAD to point at the branch (or other tree-ish)
As is currently implemented, "git checkout somebranch" does 1 and 3, "git checkout somebranch -- <files>" does just 1, "git branch somebranch" does just 2, and "git checkout -b newbranch" does 2 and 3. AFAIK, there's not a "git branch" argument that causes it to update HEAD, but the standard version of "git checkout" does exactly that. From the perspective of a new git user, maybe "git branch" is the obvious command to look at for making and using a new branch, but I think "git checkout" is the obvious command for using a branch.
So, perhaps "checkout" could have a better name; maybe "use" or "work" instead?
I must admit that I don't understand what you mean by "state of the repo at that commit" - is that related to the idea of each branch having a persistent working tree and staging area? When I run in to a situation where the second would be relevant, I tend to do "git commit -am WIP" then on return to that branch, "git reset HEAD~1". It very rarely happens that I'm in the middle of composing a commit (staging things) but need to switch to a different branch in the same project, so it doesn't really matter that the staging area and working directory all got munged together.
I've been quite busy; hopefully you'll see this. Like above, I agree that you're technically correct.
First, I would like to re-order those steps (and I am curious whether you intended the order to be meaningful). Then, I'll try to explain how they, while technically correct (again) to the best of my knowledge, don't match my mental model (particularly "update working tree", which is not part of "checkout" in smichel17-land). Yes — this is related to the "persistent working tree" (or maybe "working tree as a concept that doesn't exist in my mental model" would be better phrasing — but I'm getting ahead of myself).
---
Note that without flags, you'd have to "git branch" first, then "git checkout". Also, it's easier for me to think about if each command only performs consecutive steps, in order. Fortunately, we can achieve that:
1) Create a new branch
2) Update HEAD to point at the branch (or other tree-ish)
3) Update the working tree to match the tree-ish
"git branch somebranch" does just 1, "git checkout -b newbranch" does 1 and 2, "git checkout somebranch" does 2 and 3, "git checkout somebranch -- <files>" does just 3. I think this helps clarify my issues with both -b and --. They both change which step "checkout" starts on (and "-b" changes the ending, too!).
By analogy: if these commands are like functions, adding a step afterward is like a flag that modifies the output, while adding a step before is a flag that modifies the expected input. Sure, you could organize your code around the outputs it produces, and sometimes we do that (eg, serialization/parsing you have things like JSON.stringify, String.fromInt, different constructors for classes, etc). But typically it makes more sense to group based on what you want to do with a given object/class/data-type — it's nice to be able to answer, "I have an X, what are all the things I can do with it?"
Maybe that's stretching the analogy a little. But it ties in with my original comment mentioning the "primary action", which I guess I could rephrase as "first action in the chain".
---
What's the primary action of "checkout"? Well, "updating" the HEAD. So here's where we get back to abstractions / mental models — I would say, moving the HEAD. But first, let's take a short detour.
What's a commit? Technically, it's a diff and some metadata (author, parent commit(s), ...) with a deterministic name. But in terms of actual use, it's a snapshot of the repository at a certain point in time. (Or, if you'll permit a little snark, a version, as in "version control".) Zoom out to the whole repository, and you can visualize it as a tree of commits, ordered along the axis of time. Visualization:
git log --graph --format="%C(yellow)%h%Cgreen%d"
That includes branches, so what about them? Technically, they're tags. But in my mental model, they're boxes which contain mutable state that I might want to commit. They're the buffer in my text editor, where I make changes before saving to disk. As a crude visualization of making a commit, I choose a subset of those changes and put them in the bottom of the box (stage them), then chop the box in half. The bottom becomes the new commit, and the branch remains on top, still holding any unstaged changes.
Finally, back to the HEAD (our detour is over). Technically, it's just another tag. But to me, it's a camera, through which I look at a given box (or snapshot). It's the location where I am. It's $PWD.
To bring it all together:
- Branches sit on top of a commit and stay there.
- When I check out a different branch, they remain on top of the same commit, still holding the unsaved changes in their box. This is what I meant by "persistent working tree".
- Checking out a branch, conceptually, is just moving my view (HEAD) to a different branch. It doesn't involve changing files. It's the equivalent of "cd".
So this is why it barely makes sense for me to talk about "the" working tree. I have a bunch of boxes/folders/"working trees", called branches. That I have to copy them to "the working tree" in order to edit them is an implementation detail. Step 3 above (your 1, "update the working tree") doesn't exist in my mental model. You just move the HEAD. One atomic operation.
"-b" and "--" both break my mental model because they add side effects to an operation that's otherwise just "look around." "-b" isn't quite as bad, because I could imagine a flag on "cd" that makes it run "mkdir -p" first ("Go here, even if it doesn't exist yet."), but it still makes more sense as a flag on "branch" ("Create a new box, and look at it").
> So, perhaps "checkout" could have a better name; maybe "use" or "work" instead?
I think the new "switch" fits pretty well. And, "git branch [-s|--switch]" aren't taken yet :)
Aside, "git reset" is about moving which commit a branch sits on top of, and its various hardness flags are about what to do with the contents of both the box and the commits it was sitting on top of. There used to not be a command for "copy some stuff into my box from a different box or commit", so I had to use checkout, but now there's "git restore".
Thanks! No, there wasn't any reason behind that ordering, as far as I remember.
> What's the primary action of "checkout"? Well, "updating" the HEAD.
This might be where we diverge - I'd say that the primary action of "git checkout" is to update the working tree, secondary action is to update HEAD.
> What's a commit? Technically, it's a diff and some metadata (author, parent commit(s), ...) with a deterministic name. But in terms of actual use, it's a snapshot of the repository at a certain point in time.
This isn't actually the case; git commits contain a "tree", which is not a diff. The tree is a table of file paths, attributes, and hashes that each identify a "blob" - the (compressed) contents of a file. The commit represents a state of the committed files (and links); to me "snapshot of the repository" would include things like the state of the branches in the repo, HEAD, and commits that are present but outside the history of the commit in question.
---
Thanks for the explanation, I think I understand what you mean and like the idea. Maybe this is what you alluded to a few posts up, but it seems like a new subcommand could roll up the working directory and staging area in to a temporary commit or two, do the normal "git checkout otherbranch", then unroll any temporary commit(s) that were present in otherbranch.
At a previous employer, I used mercurial including a feature (perhaps provided by an extension, I can't find it at the moment) that allowed a commit to be marked as local-only, so it wouldn't be pushed. Something like this could probably be done with git hooks to prevent the temporary working/staging commits being shared unintentionally.
It seems like the conflict is that there actually are some separate things going on "under the hood", and you're not satisfied with the way that git has combined them? To be explicit, some distinct steps we're discussing are:
1) Update the working tree to match the tree-ish
2) Create a new branch
3) Update HEAD to point at the branch (or other tree-ish)
As is currently implemented, "git checkout somebranch" does 1 and 3, "git checkout somebranch -- <files>" does just 1, "git branch somebranch" does just 2, and "git checkout -b newbranch" does 2 and 3. AFAIK, there's not a "git branch" argument that causes it to update HEAD, but the standard version of "git checkout" does exactly that. From the perspective of a new git user, maybe "git branch" is the obvious command to look at for making and using a new branch, but I think "git checkout" is the obvious command for using a branch.
So, perhaps "checkout" could have a better name; maybe "use" or "work" instead?
I must admit that I don't understand what you mean by "state of the repo at that commit" - is that related to the idea of each branch having a persistent working tree and staging area? When I run in to a situation where the second would be relevant, I tend to do "git commit -am WIP" then on return to that branch, "git reset HEAD~1". It very rarely happens that I'm in the middle of composing a commit (staging things) but need to switch to a different branch in the same project, so it doesn't really matter that the staging area and working directory all got munged together.