This is a really well-organized dive into something that, based on the title, I didn't expect to find much value in. I was wrong.
Learning git was painful. It felt like an obstacle that periodically stood between me and getting my work done. Gradually, I fell in love with the tool, but until now I never really stopped to think about how my git workflow has evolved. Turns out, it still has room for improvement.
Coming from the simple Subversion way of doing things, instead of using "add" you could use "commit --amend". This gives you the same workflow as described in the article but without the mental overhead of the index.
commit --amend is only safe on local commits and should never be used on a commit that has already been pushed, so you can't utilise it as a main part of your workflow.
There really isn't any mental overhead to the index. It's simply the contents of the next future commit. That's all.
To understand why you need the index and what it's for you need to understand what a commit is.
Probably the biggest thing that will give you a good intuition on how git works and what it's doing is to understand that git uses content-based addressing. All those funny long ids are SHA1 hashes of the contents of an object in the object database (the stuff in .git/objects). There are only 4 types of objects: blobs (files), trees (directories), commits and tags. In the case of a blob the 'contents' are simply the raw bytes of the file on disk, so it simply SHA1 hashes those. Then a tree is simply an object that effectively contains a list of other items which themselves are either other trees or blobs. They represent a directory on disk that contain files and other directories. So the object id of a tree will be the SHA1 hash of the object ids of all the blobs and trees it contains. Finally you get to the commit. The contents of a commit is just the details of the author/committer, the object id of the tree that represents the root folder of the repo, the object id of the parent commit(s) and then the commit message. So it just SHA1's all that and that gives you the commit hash that you see.
Running git ls-files will show you the contents of the index. As you can see it's simply a list of path/file names. This is what git uses to construct the 'tree' that winds up in the commit.
Do 'git cat-files -p a42f32' where a42f32 is just any commit hash you have lying around. It outputs exactly what the commit is. Notice you see there's a 'tree' object in there.
I feel like once you understand that git becomes quite a lot simpler to reason about.
O'Reilly have a free pocket guide on git gives a phenomenally clear intuition on how git works and allows you to really understand what it's doing under the hood. I highly recommend reading even just the first few chapters. It totally demystifies git and it's a pretty quick read.
On the other hand, it is easy to get confused. For example, someone might add something to the index and then switch to a different branch. Now he wonders why the index looks weird. Of course the experienced dev understands that the index contains a snapshot instead of a diff.
I still believe that the index is accidental complexity and not necessary at all.
Very nice explanations and diagrams! I'll be sticking with the first diagram, since I've been using that basic one for several years and haven't found a compelling reason to change it.
Learning git was painful. It felt like an obstacle that periodically stood between me and getting my work done. Gradually, I fell in love with the tool, but until now I never really stopped to think about how my git workflow has evolved. Turns out, it still has room for improvement.