The git parable

RiderOfGiraffes · on Feb 4, 2011

This has been posted before, but I thought it was especially good, and the opportunity for comments on the older version (from over 600 days ago) are gone.

If you want to read the previous discussion it's here:

http://news.ycombinator.com/item?id=615308

rkalla · on Feb 4, 2011

Thanks for reposting - I hadn't caught it the first time through and thought it was a good read.

Started off easy enough (loved the simplification), but as soon as we got to SHA-1 snapshot names and "Rewriting History" with the redoing of snapshots and then to the staging directory my eyes glazed over.

It dawns on me that:

1. I have been using every SCM since CVS basically like CVS (Subversion and Git specifically). 2. I have a GitHub account I use to host all my open source and commercial code. 3. I got it because it was the "cool thing to do" and I felt left out of the loop ignoring Git for so long. 4. I have no idea how to really use Git in the ways that make Git a powerful thing. 5. The few times I have seen real Git errors like "Snapshot cannot be rebased, please schnoggleford your glerpentrap!", they have freaked me out and never been immediately obvious as to what is wrong; if it weren't for Google search'ing error phrases I would probably be using post-it notes with hand-written source code as my SCM system. 6. I REALLY want an graphic showing these different concepts explained in the article. I am a visual learner and reading (at 6am) through the snapshot re-parenting section without visual indicators of what is actually happening just isn't clicking.

It seems to me a graphic compliment to this article of what situation the main character has every step of the story would make this pretty much the ultimate "get your bearings" resource for new Git users.

At least I think so.

Who knows, I'm still excited when my pull and push commands complete without errors :)

ollysb · on Feb 4, 2011

I'm also a visual learner, I found reading http://eagain.net/articles/git-for-computer-scientists/ really helped me gain a deep understanding of how git repositories work. It goes into quite a bit of detail but I found that once I'd groked it the various git commands made a lot of sense and that I could really use them in my workflow.

nyellin · on Feb 4, 2011

I used to be in your situation. It isn't worth using git if you don't have time to learn the really powerful things you can do with it. git can increase your productivity by leaps and bounds - but you have to learn more than the basics.

I recommend reading the tutorials on gitready.com. For example, http://www.gitready.com/intermediate/2009/01/31/intro-to-reb...

steveklabnik · on Feb 4, 2011

I only ever explain git to people with the aid of a whiteboard for exactly this reason. Drawing the DAG makes it much more clear, and it's what I see in my head when I type commands now...

shepmaster · on Feb 7, 2011

Same here - having the DAG up and being able to change where things point easily is awesome. If I were pressed to explain git without being in person, I rather liked the Git Parable.

carols10cents · on Feb 7, 2011

Steve and Jake, I think you should do a video together-- I've seen Jake's version and it's pretty good. Maybe next month at PghRB?

imajes · on Feb 4, 2011

i'd love if you were to film it next time, I get git, and i can handle git, but i'd like to really deeply understand it- and visual reference may help :)

steveklabnik · on Feb 4, 2011

Can you send me an email? I use it like a to-do list, and that way it'll get into my workflow.

beoba · on Feb 4, 2011

You might be interested in: http://taskwarrior.org/

steveklabnik · on Feb 4, 2011

Thank you! Looks interesting. I'll check it out.

Luyt · on Feb 4, 2011

If you like visuals to go along with each step performed in Git, you might appreciate Pro Git. This book is online: http://progit.org/book/

I'm reading the Dutch translation at the moment.

Edit: for an example page with visuals, see http://progit.org/book/ch3-1.html and http://progit.org/book/ch3-4.html

chousuke · on Feb 4, 2011

I can't draw, but IMO git's basic operations are pretty easy to visualise if you think of the repository and commits as a graph (a tree). You start with the first commit/snapshot as the root, and every node after that is a child of the root. To bring diffs into this model, I suppose you could think of the edges as the "diff" between two snapshots; ie. how to get from node A to child A'.

The staging area is pretty simple. You can think of it as a "WIP" snapshot: it's what gets recorded in the tree when you make your next commit. It may seem like extra work if you're just used to committing every change you have in the working directory, but at least I tend to not want to do that, and git has some pretty cool tools for crafting neat commits out of working tree contents. :)

A node can of course have multiple children, so branches are natural in the git world. "Named" branches are merely pointers to a snapshot somewhere in the branching structure of the tree.

Merges happen when two branches join together: a new merge snapshot M has edges from both parents A and B (or more, if needed) representing the conflict resolution, and M represents the finished merge.

Rebasing or reparenting is somewhat different. You can think of it as cutting a branch from the tree and gluing it elsewhere. ie. if you rebase branch B onto branch A, then the divergence point of A and B is looked up, and all the edges (diffs) from that point on until the tip of B are applied on top of A, resulting in a series of completely new snapshots. This is why rebasing is "dangerous" or destructive: branch B is no longer the branch it once was, but something completely different.

Unlike in a merge, information about the original branches, or any changes needed to fit the changes on top of the new base, is discarded.

Of course, git won't actually destroy anything during rebases either. As long as you keep a reference to the original branch B, git will keep the snapshots and you can go back to them. There's just no straightforward way to tell that the original B has at one point been rebased on top of A.

EDIT:

I think I should note that every "ref" in git is a reference to this giant tree data structure. Let's take for example origin/master. When you fetch changes from the origin repository, git fetches snapshots from the remote data structure, includes them in your own local one, and updates origin/master reference to point to the correct tip in the data structure so that it corresponds to the remote's master branch.

The same happens for other branches, too of course, depending on what you tell git to fetch. (git can also handle snapshots completely unrelated to other commits in the repository. ie. you can have multiple root commits, but that's not often done. :P)

A "pull" then goes a step further and merges your local 'master' with origin/master and updates your local reference. (origin/master remains untouched until you fetch new changes or successfully push your merged master to the remote)

In essence, git manipulates references to a growing, mostly immutable, distributed data structure. I think it's simple and elegant, but YMMV. :)

lysium · on Feb 4, 2011

Doesn't git operate on a DAG rather than a tree? I mean, you have merges.

steveklabnik · on Feb 4, 2011

Yes, it's a DAG. It's easy to slip and say 'tree', though. While not strictly correct, it's pretty close.

lemming · on Feb 4, 2011

I agree that the only way to truly understand Git (which I am still a long way off from) is to understand how it works. Otherwise it really is totally incomprehensible. Two resources that helped me a lot were the "Git magic" link at the end of the article, and "Git from the bottom up" (http://www.newartisans.com/2008/04/git-from-the-bottom-up.ht...). Well worth reading, you'll realise just how simple Git really is and it's quite beautiful how much utility comes from such a simple model.

albemuth · on Feb 4, 2011

A co-worker was totally freaked out because he had accidentally deleted his master branch, I walked him through how to create it back from origin and tried to explain the why and how but some people just don't care, specially once they know there's a 'Git guy' you can go to.

JoeAltmaier · on Feb 4, 2011

The differences between source control paradigms come down to a few things: what is the resolution of versioning (project/subproject/file); how are checkpoint labels created/tracked; how many levels of sub-project are supported; what tools are available to resolve collisions/migrate change sets. There's more but that's a start.

Its not really "source control" until you can have multiple team members making changes to entire source sets independently and then reconcile them in a controlled manner. Missing that, you have a "version control" system - little more useful than zipping up your source periodically and saving it in a folder.

I'm not sure the article explains enough to put git into perspective - its mostly a "what is version control" parable?

fr0sty · on Feb 4, 2011

> I'm not sure the article explains enough to put git into perspective - its mostly a "what is version control" parable?

The parable walks you through git's data model and is therefore git-centric. The metaphors used are a rather thin veneer on top of the way git actually works instead of a thick and leaky abstraction.

You could apply the high-level concepts across other systems but they would not map as cleanly or completely.

lysium · on Feb 4, 2011

Good point; to be fair, the parable starts off saying "how you might design one such version control system (VCS)".

JoeAltmaier · on Feb 4, 2011

Right; but I believe git to be a source-control system!

Forgot to mention: there are also build-control-systems, release-control-systems. All often confused/muddled in discussions.

s-phi-nl · on Feb 4, 2011

An excellent similar site is Understanding Git Conceptually: http://www.eecs.harvard.edu/~cduan/technical/git/

nickknw · on Feb 4, 2011

Thanks, I was trying to remember where this article was.

bricestacey · on Feb 4, 2011

I think the OP is stretching it to say that this info will let you "master the various Git commands." I knew pretty much everything in the parable, yet I can hardly get past add, commit, push, and pull. There is a distinct difference between operating and understanding the internals of Git.

js2 · on Feb 4, 2011

Please ask away if there's anything specific that's confusing and I'll do my best to explain.

bricestacey · on Feb 4, 2011

My biggest problem is I work alone, am undisciplined, and do not work with mature software. So my commits are all over the place. If I've already started to stage one commit how can I stage a different commit without losing my current progress?

I actually like git but I haven't found a tutorial that demonstrates a workflow for non-ideal conditions.

js2 · on Feb 5, 2011

Something to remember:

  working tree --[add]--> index --[commit]--> HEAD
                          index <--[reset]--- HEAD
           ^----[diff]----^   ^-[diff --cached]-^

• git add moves things from the working tree into the index

• git commit creates a new commit from the index and makes it your new tip commit, typically updating whatever branch is indicated by HEAD. (HEAD literally is just .git/HEAD and is normally the name of a branch, take a look)

• git diff shows you the difference between the index and the working tree, i.e. unstaged work

• git diff --cached shows you the difference between the index and HEAD, i.e., staged but uncommitted work

• git add /path/to/file will stage an entire file

• git add -p /path/to/file lets you stage individual hunks from a file

• git reset unstages everything

• git reset /path/to/file unstages that file

• git reset -p /path/to/file lets you unstage individual hunks from a file

Now you know how to stage/unstage individual bits of work, either a whole file at a time, or bit by bit. You also know how to commit what you've staged, and how to view what you're about to commit.

So if you've started to stage some work and you're ready to stage a different commit, just commit the stuff you've already staged. e.g., let's say (for simplicity) you've edited foo and bar and want to make those changes separate commits:

   $ git add foo
   $ git diff --cached (confirm it's what you expect to commit)
   $ git commit -m "add fn() to foo"
   $ git add bar
   $ git diff --cached (confirm it's what you expect to commit)
   $ git commit -m "fix bug in x()"

In between making those two commits, you may wish to test the first commit. You could do that like this:

   $ git add foo
   $ git diff --cached (confirm it's what you expect to commit)
   $ git commit -m "add fn() to foo"
   $ git stash save "temporarily set aside work so I can test foo"

At this point, your unstaged/uncommitted work has been set aside and your working tree matches (with respect to the files git is tracking) the commit you just made. Here's where you can perform testing, potentially amending foo (git add, git commit --amend). Okay, you're ready to continue:

   $ git stash pop
   $ git add bar
   $ git diff --cached (confirm it's what you expect to commit)
   $ git commit -m "fix bug in x()"

I hope this helps. The key thing (IMO) is really to understand what git is doing under-the-hood. The git UI and (especially the) documentation are all over the place. Don't be afraid to start with a test repo and peek inside .git to see what it's doing as you execute commands.

Good luck!

bricestacey · on Feb 5, 2011

Thanks, this is really helpful.

git stash is exactly what I was looking for in order to save my index without having to make a random commit. And then the -p option is icing on the cake (ps, it also works for stash).

I ran into a tiny problem with git stash. Assume I have a file called README. If I do the following:

  $ git mv README README.markdown
  $ git stash save
  $ git stash pop

The index no longer knows that I did a 'git mv'. It turns out the --index option will try to reinstate the changes made to the index, but it might cause a merge conflict.

Also, the patch interactive mode documentation says "g - select a hunk to go to" yet it never gives that option. I thought it might not allow it if there are a small number of chunks, but I even tried it on an old repo I have with 7 modified files and 'g' was never an option.... Have you ever used g?

js2 · on Feb 5, 2011

I didn't mention --index since I was trying to keep it simple.

-g works for me, but it's only an option if there's more than one hunk in a single file. You can use 's' to split a hunk up if its non-contiguous.

bostonvaulter2 · on Feb 5, 2011

You may find this post interesting. Git means never having to say "you should have".

http://tomayko.com/writings/the-thing-about-git

alan · on Feb 4, 2011

I've seen this before and it explains how git works nicely. What I would like as a companion is the git commands to do the things described in the parable.

HerberthAmaral · on Feb 4, 2011

Pretty interesting. I also tell my students to imagine a commit like a checkpoint in a videogame.

I try to teach real VCS, starting with simple patches, tarballs and merges and going through git, like it is an "patch/merge automation process". I realize they really like it and I try to improve some techniques to teach real VCS to students.

Glad to hear from people who's passing through same situation as mine :-)

torme · on Feb 4, 2011

Somewhat related, but Joel Spolsky has a really excellent tutorial on Mercurial:

http://hginit.com/

It covers a lot of the conceptual differences between the distributed version systems and subversion, and gives a similar sort of story telling walk through of use cases.

xbryanx · on Feb 4, 2011

The number on take away from this story was the link to the great git magic resource at the very bottom.

lysium · on Feb 4, 2011

A good idea to explain git from its historic perspective!

I wonder, though, if some graphics would not have helped. People who hesitate to use VCS or who barely understand SVN will have a hard time understanding git without any graphic explanation of the DAG on which git operates, I presume.

pnathan · on Feb 4, 2011

One curious thing I notice is that there's an emphasis on "understand the underlying mechanisms" in git thought, vs. with hg being a "just use it" approach.

I really like the idea of a tool not having its underlying details poking out.

jpitz · on Feb 5, 2011

I really do to, and that was one of the reasons I focused on Mercurial ( after I abandoned my first love, Darcs, after a rather lengthy merge. Really lengthy. )

Mercurial is a beautiful tool, with very few of the sharp corners poking out. But, at least for the source trees and development branching style I deal with, I'm convinced Git is more adaptable to any workflow you can imagine.

In the end, it may very well be the case that you HAVE to understand Git that well, because that means truly understanding the tree perturbations you're trying to manage. I've been slowly coming to believe that has to be true.

endlessvoid94 · on Feb 5, 2011

If this could be formatted as a tutorial, you'd have something really superb.