Hacker News new | past | comments | ask | show | jobs | submit login
Picturing Git: Conceptions and Misconceptions (biteinteractive.com)
145 points by nimeshneema on Sept 2, 2021 | hide | past | favorite | 99 comments



I read most of this long article, and I found it useful, but:

It's unsurprising that people's mental model of git is incorrect. Git is not something people study at a conceptual level, it's something they learn recipes for in order to work on some project. Recipes like "how do I save all this work I just did" and "oh shit, everything is hosed, please give me a magic spell I can paste into my terminal to fix it".

I don't really blame people, since git itself does nothing to teach you how it works. Git it is the definition of something you have to deal with in order to do something more important to you. Some people want to dig deep and understand how the system works: it's nice to sit near that person and ask them for help sometimes.

Saying "you should really understand more about git" is like saying "you should really study the tax code, it's important and it affects you whether you like it or not." True, but deeply irrelevant!


I think it's the other way around. The fact that git does not provide a clean analogous way to intuitively interact with it just demonstrates that the git interface is horribly broken.

This is not essential complexity, it's just bad design that stuck.

Take a look at https://gitless.com/

If you just look at a summary of the commands, you will have an accurate mental model of what's going on:

    gl init - create an empty repo or create one from an existing remote repo
    gl status - show status of the repo
    gl track - start tracking changes to files
    gl untrack - stop tracking changes to files
    gl diff - show changes to files
    gl commit - record changes in the local repo
    gl checkout - checkout committed versions of files
    gl history - show commit history
    gl branch - list, create, edit or delete branches
    gl switch - switch branches
    gl tag - list, create, or delete tags
    gl merge - merge the divergent changes of one branch onto another
    gl fuse - fuse the divergent changes of one branch onto another
    gl resolve - mark files with conflicts as resolved
    gl publish - publish commits upstream
    gl remote - list, create, edit or delete remotes

To me this clearly demonstrates that the problem isn't that people aren't learning git, it's that git is bad to learn. Stash + Index + Working Tree isn't the right abstraction to present to people. Just say there is a working tree, and tracked and untracked files and snapshots. Done. Branches aren't particular commits but particular working trees on top of particular commits.

Working on a feature and want to look at the main branch, but not ready to commit the changes yet? Well just switch to the main branch, then switch back and pick up where you started. No need to know about an additional data structure called the stash.

Unfortunately this did not pick up enough steam. And because a lot of tools expose concepts from gits broken interface you have to learn the git interface anyway...


Having used `gitless` a while ago as my main interface I strongly disagree. Having a distinction between my working tree and things I'm actually considering to commit is a luxury you only really start to miss when it's gone. IMO gitless makes it way too easy commit too much. Also it's "feature" of keeping uncommitted changes local to the branch is just weird. If I want to make a branch specific change, I create a commit. This has the big advantage that it actually forces the user to add a message what the change is about, so if something else comes up I know what was going on when coming back to it later. It's not like this has to be a formal commit message, after all the commit can be dropped again later. Otherwise you end up being surprised by old experiments when switching to branches you haven't used in a while. If I just switch branches then the most likely reason for that is that I want to move the changes.


Interesting. I haven't met many who have. Your two usecases basically never arose for me. If I switch back to a branch and there's random stuff there, then I can just revert easily. So it's an extra operation at a different time to get there. The other usecase for switching branches temporarily where it's one less command, is more important to me though. The crucial thing though is that both behaviours can be accessed easily but we are dealing with one less data store/stateful thing, because we don't need the stash.

As for the first point, fine grained control for what goes into a commit, that's definitely a power user feature, but an important one of course. Again there are ways to achieve this without introducing new state (the index), for example by allowing to amend the last commit.

I wouldn't claim that gitless is a 100% complete git replacement for expert users. It just shows that git has way too much state exposed to users, and has confusing commands to make that state interact. Obviously we all learned git and use it successfully, so it's obviously not broken or anything, it's just worse than it could be (and the constant chorus of "it's so simple, just a DAG!" is a bit grating if you have to teach beginners regularly).

The gitless authors did do some research with users that backs up the claim that this is conceptually easier to use:

https://spderosso.github.io/oopsla16.pdf


> As for the first point, fine grained control for what goes into a commit, that's definitely a power user feature, but an important one of course.

It's not a power-user feature, and it shouldn't be considered one. It should be taught as a standard part of any workflow: before committing, look at the changes you're about to add, and use hunk-staging features (e.g. trivial using Magit) to stage and commit unrelated changes separately.

For example, did you clean up some comments and docstrings while you were adding a new feature? Commit those improvements separately, so that if you need to revert the feature commit later, the improvements won't also be reverted. It also makes reviewing much easier, as each commit or patch, having its own purpose, can easily be reviewed separately, and attention can be focused on parts that need changing.

> Again there are ways to achieve this without introducing new state (the index), for example by allowing to amend the last commit.

Amending a commit does not serve the same purpose as staging files and hunks separately into the index.

It's my impression that few git users understand the value of the index, because few of them use porcelains that expose its power in simple ways. If I had only "git add -p" to use, I might not, either. But Magit is, well, like its name implies, like magic.


gitless has the --partial flag that allows you to commit parts of files interactively.

And your workflow of gradually building up an index of (parts of) files can be achieved by partial/amendable commits. You simply iteratively/interactively add files and partial files to your latest commit until you're done. Instead of building up the index and then committing it, you just build up the commit directly.

This also means you can interact with the "in progress commit" in the same way as with all other commits.

There is no need for having an index to realize what you want.

Another minor point: Your workflow _is_ a power user workflow in my world. Out of twenty people that have reason to use git, one has use for this workflow.

It seems we roughly agree that there is a lot of scope for improving git though. I looked at magit and it looks nice. It exposes all the moving parts in a user interface. I would prefer to just have fewer moving parts, but if they are there it's sensible to make them obvious (and it puts to rest the idea that all you need to understand is that the git data structure is a DAG...)


    gl merge - merge the divergent changes of one branch onto another
    gl fuse - fuse the divergent changes of one branch onto another
Good while it lasted though


Yup. That was exactly the point at which the commenter's promise of "just look at a summary of the commands, you will have an accurate mental model of what's going on" break down for me.


In the same vein as my sibling but not repeating what he said I agree with him though, I regularly commit just specific files. I actually teach every GUI I use that comes with git integration NOT to Auto add and such nuisances. I use the command line and in probably 90% of cases a git commit -a is what I do. Another 5 is git add the entire directory tree I am in and the other 5 are specifically picking what to commit. I'm all for UIs doing auto add and commit -a equivalent by default. But do not take that ability away from me!

The list you provide sounded great until it came to gl switch. Why is there one specific operation for a branch that is NOT done via gl branch?

I don't understand what fuse is supposed to do from this at all. No idea whatsoever. Merge I get and anyone who has worked with any other versioning tool does conceptually.

Rebase most people seem to have a problem with but the abstract concept really isn't that hard. Just like cherry pick isn't really hard but somehow people have trouble with it. Though conceptually it really isn't hard either.

What really helped me the most with git was the realization that it's just a tree of commits with a bunch of labels. Labels have different types so to speak, like branch or tag, remote branches being special in a way etc. And obviously various commands can interact with these labels. Like a fetch updates the remote labels and moves them around on my local copy.


Did you actually look at the page the list is from? THis isn't some sketch, it's a fully implemented way of working with git repos that supports everything you ask for. Committing just specific files is done in gitless via

    gl commit a.foo b.bar
committing all but some files is done with

    gl commit -e a.foo b.bar
gl commit -p allows you to interactively commit parts of files.

gl doesn't take any abilities away (it's just git under the hood after all), it just exposes the abilities in sane ways.

If you actually look at the homepage of gitless you will also immediately see what fuse does:

https://gitless.com/#gl-fuse

I believe that by reading that one, not very long page, most people (including non-programmers) can use gl correctly most of the time. This is not the case for git.

BTW, gl branch is for creating/deleting branches, gl switch is for switching your working tree from one branch to another. These are very different things, why should they be under the same command?

For git, the last paragraph is a necessary but in no way sufficient step towards using it proficiently. Gitless is actually much closer to realizing that vision.

Seriously, people need to go back and teach beginners git to realize how bad it is. We have internalized so much of the bad design decisions in git that we don't notice them anymore.


No I did specifically not go to the gitless site because apparently I was supposed to understand it just from the list given in the post. Which isn't true.

I understand what gl branch "is supposed to do" but I don't see why gl switch is its own command given the other reasoning presented for why gl "is better".

I would say it is different. Probably very workable. Completely intuitive and the only reasonable way to do version control? Definitely not.

To me it's very very natural that git checkout will check out any commit I give it. How I specify that commit is up to me. It could be the commit hash. It could be a text label. That text label might on a logical level be a branch. Or a tag. Why do I need to switch branches with a special switch command when checkout handles this perfectly well?


Everything is perfectly logical after you get used to it enough.

And no, my post did not say that you will understand the details of every command in the list from just looking at the list. It only said that the list demonstrates that a much more coherent, less stateful, simply better UI is possible. That you can not fully explain the difference of fuse and merge in one sentence summaries is not a counterexample to that.

I said you will have an accurate mental model of what's going on. From the summaries you can tell all state you interact with: Working Tree, Commits, Track/Untrack status. That's it. That mental model is perfectly sufficient to accurately predict what most anything will do. And crucially the explanations and mechanisms to achieve all the workflows people asked for in this comment thread can be achieved with these ingredients just fine.

It's what the "Git is easy! You just need to understand that it's a DAG!" crowd pretends git already is.


I hate the inconsistency that stash brings, but gitless is useless since it destroys the primary use case for stashing: I start making changes and then realise I'm working on the wrong branch. Git's solution to the problem is awful, but it's better than nothing.


> just demonstrates that the git interface is horribly broken

This is HN criticism #94238 on the terrible git CLI.

Okay, sure.

Would you kindly post your superior git CLI? Or at least the outline of it?

---

Snark aside, Git's popularity is not an accident. Bitbucket supported Mercurial too.


> Would you kindly post your superior git CLI? Or at least the outline of it?

You are literally replying to a comment that describes a possible better CLI for git...


I suspect it was added in an edit in response to this comment. Dad is downvoted!


Actually no. I didn't edit the original post... shrugs


git has quite an inconsistent cli, this is well covered in "master git". And yes, I said it without proposing a better one.


"master git" This is a great demonstration of git's inconsistencies.

I maintain git stacks up well against other similarly mature/complex software (Nginx, AWS, Java), but it's a wonderful read nonetheless.

(And holy hell what a hard thing to search for...can't find the link.)


https://stevelosh.com/blog/2013/04/git-koans/

The trick is to remember it's called "git koans"


Thanks.


personal opinion, if you're a software engineer that can't be bothered to learn git I'm not sure that I respect you as a professional


> I don't really blame people, since git itself does nothing to teach you how it works. Git it is the definition of something you have to deal with in order to do something more important to you. Some people want to dig deep and understand how the system works: it's nice to sit near that person and ask them for help sometimes.

The official git handbook, freely available on the official git-scm site is not terribly long, and explains the internals on a conceptual level quite well.

I think the problem is most people learning git land on some wordpress site of someone trying to flog a condensed and uninsightful shortcut to getting started with git for ad clicks, which only involves a series of commands without explaining the effects of those commands - This, combined with peoples expectation that an SCM should take no thought whatsoever causes most people that use git on a day to day basis to not really understand it at all.

Git needs to be introduced as powerful data structure, kind of like how SQL is not a DB, imagine someone explaining SQL without ever refering to the DB tables, rows and fields... only talking about git commits is like only talking about the result of a single query. You must understand the data structure to easily use the interface, otherwise the interface will be very confusing or you will be limited to "recipes"... after that you are just learning new variations on how to manipulate and navigate that structure (yes the graph), and from this perspective peoples complaints about the historical inconsistencies we have to put up with in git porcelain are moot.


I was going to write a blog post conveying my mental model of what git is (having had one too many conversations along the lines of "no, git is not a ledger of diffs").

So, I started reading through <https://git-scm.com/book/en/v2/Git-Internals-Git-Objects> again to make sure I didn't have anything wrong.

But now there's no point in writing a blog post. Maybe I'll write one that just links to <https://git-scm.com/book/en/v2/Git-Internals-Git-Objects>.

It even has nice diagrams, which I think are essential for this kind of thing.



The tax code is a completely inscrutable mess but git's internal model is one of the most simple and elegant structures in modern computer science. It's just covered over with utterly stupid commands and terminology that obscures the beauty of the underlying architecture.

I used to despise git because it was so hard to learn. Then as an exercise I started writing my own code to read and write its underlying files and it finally dawned on me how simple the whole thing was.

Git's a very unusual piece of software; it's mind-bogglingly useful, the basic data structures and algorithms are perfectly matched to its job, and it has a UI that's a train wreck.


In a literal sense, sure, git _the tool_ doesn't do much, though I think this is slowly improving as it evolves. For example, there is an experimental `git switch` command[1] under development to provide a simpler interface for changing branches. For me, the biggest leap in developing my own mental model was reading Scott Chacon's book Pro Git, and that is now available online for free on the official Git website[2].

[1]: http://git-scm.com/docs/git-switch

[2]: http://git-scm.com/book/en/v2


idk man, if you're a software engineer I think the onus is on you. There are plenty of great and free resources, like the pro git book. Every month there's a thread where a bunch of people come in and bemone how git is complicated blah blah. Every month lots of people point out that git is much easier to use if you just bother to conceptually learn about it's internals.

it's like coming into a forum for accountants where people bitch about having to learn tax code. please...


I use this when I want to teach someone Git: https://gist.github.com/nicowilliams/a6e5c9131767364ce2f4b39...


I find that a good introduction.

All operations on a repository involve adding commits and/or manipulating the name resolution table.

It may be simplified, but that statement alone, taken in context, is worth its weight in gold.


Thanks!

It's simplified, but really, not that much.


>Some people want to dig deep and understand how the system works

I'd say that's definitely the case but also a problem.

Sophisticated users mixed with people who just want to do a few simple things is a bad combination. I seem to remember that ClearCase had the same issues.


Git isn't going away, so we might as well master it.

If you would like to know more about how to manipulate the git graph, take this excellent (and free) training:

https://learngitbranching.js.org/

To slowly level up, you can watch video demonstrations from Dan's git school. Dan provides 48, 30 minute training videos:

https://www.youtube.com/watch?v=OZEGnam2M9s&list=PLu-nSsOS6F...


Related: A video[0] from The Missing Semester, a course by CSAIL MIT, which covers Git in one of it's lectures. Personally, the entire series is a must watch, but if time is limited, the first 20 odd minutes give an absolutely fantastic introduction to Git.

[0]: https://missing.csail.mit.edu/2020/version-control/


That's the blog post I always wanted to write. So many people spend little to no time to actually learn Git, because "it's just a tool to help you doing your "real" work" (=coding) . Or because "Git is too difficult/confusing/broken". Or because "I don't need anything except commit/push/pull". I find those arguments somewhat true, but I still feel that people are missing out when they don't learn a tool they use daily properly. In best case, it makes them less efficient. In worst case, they get into "unsolvable" issues and/or pollute the Git history with useless commits, making blames more troublesome for the whole team.


It is just a tool to help you do your real work and famously gets in the way. You use SVN and it covers 99% of use cases much more simply than Git manages.


If svn covers 99% of your use cases, then you need more experience with distributed version control systems.

Able to commit locally, examine changes work with them and then push is a something you might not need or require if you think about version system like SVN.

But if you have learned Git or Mercurial or some other distributed system you would never go back to svn.


I have lots of experience with git (5 years of usage, 1 of those years was writing tooling in and around git) and pretty much the same experience with perforce, and I much prefer the centralized model of perforce to all the extra fun that comes with git


I was molded in "git" way of thinking about version control, but I was forced to use SVN in a job. Not having local branches is not that bad. Like, people find ways to work around it: e.g. maintaining changesets in patch-files, or just keep multiple different checkouts.


that's just laughable considering SVN needs a central repo to work and I can just do "git init ." to create a git repo in any directory at any time I want and is entirely self-contained.

Once an SVN user discovers the magic of a staging area, stashes, or "git add -p" I don't know how they could claim SVN does anything better. All I remember from those days was how slow everything in SVN was. It felt like every command was backed by some horrible O(n^2) operation or really slow network connection.

git isn't hard. FFS, we shouldn't keep seeing these posts hitting HN every week. iptables? That's tough. DNS? No thanks. Managing package.json and keeping an app up-to-date? Git is nothing in comparison to the real challenges I face everyday.


I've been using the staging area for a long time. It only makes me wish I could turn it off.


git is FUCKING HARD due to its tons of poorly designed misfeatures. The index/cache/staging area is the worst of them. If git wants to make any progress towards actual usability, this mess needs to be untangled with prejudice.

Other tools have none of this overly stateful bullshit. When I want a file to be included in the next commit, I don't want a silent, implicit copy to be whisked away into some interal storage the moment I flag it. There is NO reason why merges need to destroy the contents of that same storage area. The totally confusing semantics of the reset command with the three nonsensically named modes soft, mixed and hard are also created solely by this particular misfeature of a magical hidden storage area.

There is also no reason to even support destructive history editing. Immutable history is the correct choice. The mercurial evolve extension, for example, supports rebasing without destroying history.

Also, the combination of not being able to close branches instead of deleting them and not storing branch names in commits makes git history completely undecipherable when you have to go back more than a few merges. You might just as well throw it into the garbage bin.

Just take a look at competing tools (free and commercial) to see what's being innovated in this space and how this widespread obsession with git is in fact preventing much needed progress.


tl;dr; a monkey is given a hammer and is baffled by utility.

rofl. git is perfectly usable. proven by the hundreds of thousands of people using it daily. most of your 'complaints' here are not actually how git works in reality, and are just user error due to not bothering to learn the tool.

1. git never destroys your work inside a repository unless you told it to.

2. git doesn't 'whisk' things away implicitly. you committed them, that's pretty fucking explicit.

3. ummm every version control is stateful. wtf do you think any given version is? its fucking state. SVN has internal state, git, hg, fossil....

4. the commands, sure naming is fucking HARD we know this. we also know git gained different features over time to accommodate different workflows. perfect? absolutely not but a rose by any other name would smell as sweet.


From dissecting your comment, I am not sure how well you know git.

1. a) git's garbage collector runs without asking first. b) git's UI is so bad that users regularly end up in a state where they effectively asked git to destroy data without realizing it. It's often too implicit.

2. Ever modified a file between git add and git commit? Did these extra changes get committed? Some graphical clients try to hide aspects like this.

3. Unlike SVN, hg etc. git has superfluous extra state with complicated rules. The fact that the cache/index/staging area is inconsistently named three different things just illustrates how byzantine these rules are. And they can be done away with completely.

4. A rose by another name wouldn't smell as sweet - our senses of taste and smell are easily influenced by our other observations about an object. Human perception is weird. In the same vein, bad naming of features invites more usage errors.


Early in my career I was a junior developer on a team using SVN. I knew git and used a git plugin that let it work with SVN. I would often talk about various branches and things I was doing locally in my repository and the other developers would get concerned, because making a branch is SVN is a big deal, and here I am, it seems, making tons of branches. It was fun because when I would finally upstream my work, it would dump a dozen commits into the SVN history all within the same second. It was kind of nice having the linear history forced by SVN though.


> that's just laughable considering SVN needs a central repo to work…

Do you use GitHub, GitLab, Bitbucket, gitea, ...? There's your central repo. If nothing else, it represents the “backup” facet of using an SCM.


>Once an SVN user discovers the magic of a staging area

You can already do that with TortoiseSVN or just through the cline

>stashes

SVN uses shelves

>git isn't hard

This feels like the "lisp is magic" argument that no one can seem to prove, despite how universal its proponents claim the law to be.


When you write things like that, you come across as trolling. Their point wasn't git vs. another tool, their point was the importance of knowing intricate details about your tools in certain circumstances.

If SVN is wonderful for you: Great! But that's not really relevant to the issue of using git effectively.


I've never used SVN, but I assume that because git uniformly won as the VCS tool, it suits most people better than SVN..?


The reason git won was in part due to the success of GitHub. It was the first code sharing platform that wasn't a crap.

Once open source projects started moving to it from older platforms (Sourceforge, Trac, etc), developers who might not have cared so much about VCS flavours followed.


In turn, I think GitHub won because of Git.

Centralized version control systems like Subversion are oriented around "committer permissions". Giving someone commit access incurs friction (waiting for a human to respond) and risk (potentially unwanted changes).

None of the old platforms like SourceForge implemented the concept of cloning a whole repository. Even if you made your own clone (manually or otherwise), what good would that do? Your copy would diverge from the upstream and can never be automatically reconciled. You would be forever doomed to sending patches to upstream, and calculating diffs from upstream to apply to your repo. Or just replacing your clone with the upstream's history after your patch gets accepted.

Git natively supports multiple asynchronous repos representing the same "project" because its history is designed around a directed acyclic graph (DAG) structure. You don't need the upstream repo's permission to make your own clone and commit some changes. After you diverge from the upstream, if you solicit them with your branch and they accept, then both of you can converge once again - whereas this is impossible in a linear history model like SVN.


hm. vexing.

I feel like this is mostly accurate, to my knowledge, but reading this:

> I do not claim that this way of looking at Git represents absolute “facts” in any hard and fast or literal sense. But I contend that if you conceive of Git in the way that I’m going to suggest, if you substitute these conceptions of Git for any misconceptions you might have now, you’ll be a much happier and more fluid Git user.

…vexes me.

“Think of git like bowl of peanuts and marshmallows” and other pointless, wrong, metaphors about how git works are a dime a dozen.

Yet, here is someone who is clearly quite familiar with git, and they go to pains to point out they are simplifying and may not be correct in their explanations.

Its good to be humble, but ffs, git is too frigging complicated if the best you can get is a “probably wrong simplified mental model of how it works so you can be a bit more productive with it”.

I dont care;

- a simple meaningless metaphor that lets you be more productive? OK.

- a accurate description of how things actually work? OK.

…but pick one.

What I do not want is a possibly wrong complicated explanation of how git maybe works.


This article presents an accurate picture of how things work at a high conceptual level. It glosses over certain details, because git is very complex. For example, git has, if I recall correctly, 4 staging areas, of which represent different sources when it comes to a merge conflict. However, this detail can mostly be ignored because it’s not relevant to the high level conceptual ideas this article is trying to present.

I would argue most things in technology are complex, and mental models are intentional ways to take something complex and turn it into something more simple. This article does not create meaningless metaphors.


> However, this detail can mostly be ignored because it’s not relevant to the high level conceptual ideas this article is trying to present.

Personally speaking, I find knowing and distinguishing among the 4 indexes to be essential to understanding git. Not including and really exploring that detail gives people an incorrect mental model of what's happening.

Marvelous, if the metaphors of the article helped you, but I empathize with the upstream poster's frustration. I believe that the content of the article is not medicine for the malaise it describes.


The official documentation for `git add` refers to "the index". I'm not seeing any reference to multiple indexes. I've been using git for years, and I've never heard of it. I read a book about it. But parts of it are definitely still mysterious to me. Anyway, where can I find any evidence of these 4 indexes?


Sorry, category error. I was using index to refer to the category of which index is a member. Maybe areas is better. These are the 4 ... things...:

working directory - this is the project directory in the OS file structure

index - a.k.a the staging area

repository - in the .git directory

stash - a kind of scratch pad or clipboard for the developer

Understanding these different areas and how and why to move data into each is essential to understanding git


While you are correct, this is not what I was referring to.

To be exactly clear, there are actually 4 staging areas within what you referenced as the index. This is indeed a detail most people do not worry about.


You see this in the comments here too. Everyone has to add a disclaimer to every statement about how (they think) git works: “As far as I know”, etc.


And it's not even necessary -- the git data model is simple. It's simple enough that you can generate valid commits in about a page lines of python, with no libraries. The rest is just packing files for efficiency, and finding the difference between hash lists for syncing and pulling.

The command line interface to git is insanely complicated, confusing, and unnecessarily difficult to use, but this isn't a result of the git data model. It's definitely possible, to give a complete and accurate description of the data model, even using examples from `git cat-file` to walk through the commit history by hand.

I've also got a simple demo that generates a complete repo with a commit. You can manipulate the resulting repo from git. There are 65 non-comment lines of code.

Here it is: https://orib.dev/ugit.py


> Picturing Git: Conceptions and Misconceptions

Based on the title, I was expecting a more in-depth study of user misconceptions about git, similar to the famous CogSci paper "Two Theories of Home Heat Control." Except with like, diagrams.

And now I want someone to make that happen.


While you're on wait there, you can read the excellent hg init https://hginit.github.io/ for inspiration.


I avoid using words like "simple" when writing about technical topics. "It is simple!" is not inclusive language and, love git or hate git, it is not inherently simple. It takes effort to understand, and experience to avoid its pitfalls.


Ugh. So many concepts. So many things to remember. Why? Git is simple. SIMPLE. But only, IMO, if you go bottom-up and not top-down. There are only 6 critical concepts in Git and each is simple enough to be described in a single sentence.

1. Commits are immutable blobs that have one or more parents. Graphs, not trees. Anyone who uses trees for git commits misses the whole point and makes their (and their collaborators) lives complicated.

2. Tags are (mostly, best practice) immutable pointers to commits. Tag are "this is this thing FOREVER*."

3. Branches are named, mutable (by design) pointers to commits. Branches are "this is this thing FOR NOW. Later it'll be something else."

4. HEAD is special "branch" that moves around automatically.

5. Origin is the local snapshot of the remote. Origin is "what did it look like when I last looked."

6. (fundamental but not critical) Remote is the current remote state (queried by RPC).

7. Index (aka stage) is where you put changes you want to make into commits. (this is somewhat simplified). Index is "My current and immediate plan. Scrub as needed."

That's (mostly, for non advanced use cases) it. Everything else are commands to query or manipulate the various state. Every action (until it becomes instinctual knowledge) should follow the same recipe: 1. Figure out the current state (current commit graph, relevant branches). 2. Figure out the target state (desired commit graph, new branches positions). 3. Mutate using ANY command you want.

I think that's the issue really. Inexperienced dev / people who don't understand git look at commands as "this is how to do a thing". No. In Git there isn't "how to do the thing". It's exactly like writing code - so many ways to achieve the goal, just choose your own. It might be efficient and elegant, or bumbling and ugly, but it'll get there.


Sorry, but what? Specifically: what about your writing justifies the assertion that git is simple? Git is a horrible convoluted set of commands to make a lot of different data structures [1] interact. And if you do it wrong you can get into very weird states. This is not simple in any meaningful sense of the word!

Heck, "a monoid in the category of endofunctors" is simpler.

[1] From the top of my head: The working tree, the index, the stash, the repo ADG, the local remote repo ADG, the remote repo ADG. Of course the branch labels are further state, and working with the commits directly is discouraged. Oh and files can be either tracked or not, and they can either be ignored or no. And one isn't a subset of the other. And that also interacts with the various state transitions.


Any system with a limited amount of concepts is simple. Emergent properties are easy to predict and explore. Physics of "perfect friction-less sphere in vacuum" is so easy to understand we teach in grade schools and toddlers grasp it by instinct.

I can't (yet) reason about monoids easily. But I can reason about Git, even if I can't figure out the single command to change the state the way I want it and have to resort to multiple commands. I guess it's easier for me to think in graphs.


But it's not a limited amount of concepts. One sphere in a vacuum is easy, three spheres is hard. Git has half a dozen subtle interacting data structures. But because people have built up a lot of experience working with them (and don't coach beginners and non-programmers) they shout "it's just an ADG, so simple!" and pretend like everything is fine...


Agreed. I have a couple of things to make my life slightly easier.

I could never understand what kind of twilight zone stashes go into or remember which stash is which when I had too many of them. So I never use stashes any more, I just make a branch instead.

I largely use git add -A, so I can pretend that the index does not exist.


2. Tags are named, mutable pointers to commits.

3. Branches are named, mutable pointers to commits, that you can "ride". While you "ride" a branch it keeps moving to always point to your latest commit.

4. HEAD is an implicit branch that you "ride" at all times.


afaik, the commit a tag points to is immutable unless you delete and recreate (and then nothing is immutable really).

re "ride" - that's exactly what I'm trying to avoid. It's an additional concept that isn't needed to understand Git. You need to understand the model. The "ride" is an emergent property of the model and commands that you eventually understand, but not a core part.


Without the concept of "riding", terms "tag" and "branch" would become exact synonyms. In that case you can just remove point 2 (consider it just a syntactic sugar) and thus simplify your list.

If a tag has any attempt at immutability at the data structure level, I know nothing of it.


It's all about how other clients treat branches and tags.

Once you've pushed a tag, no other clients will be willing to update their definition of that tag unless the users on those other devices force the issue.

So operationally, "tags are immutable once pushed" is a pretty reasonable way to look at things.

Remotely pushed branches of course also won't allow you to do anything but append without forcing on remote clients, so mutable and immutable isn't quite right, here.

So I guess I agree with your original contention, branches are mutable-and-you-can-ride where ride means "the remote client's porcelain will be happy with append mutations".


As far as I know, tags don't have an identity beyond their name. The CLI tries to steer you away from replacing tags (by naming the option --force rather than, e.g., --modify), but that doesn't make them immutable.


See git tag -f.

Sometimes it's reasonable to consider a tag immutable, though you should always checksum if you do.


Reminds me of:

    Bad programmers worry about the code. Good programmers worry about data structures and their relationships.

    -Linus Torvalds
Anyway, it's not for everyone to get to understand git this way, I guess. Some people will just react "just tell me how to do X in git!"


Sometimes you just want your tool to get out of your way and get the job done, instead of deeply understanding it. No shame in that. There are limited hours in the day and sometimes other things are more important.


I get what you mean, but git is basically a tool for manipulating the .git directory (and a working directory checkout).

I think understanding what .git dir contains and represents is as important as understanding the tool.

It's not like you need to understand how the tool works internally, or how it's built.

Analogy would be that you want to understand how to use a hammer, sure, but also the characteristics of material you manipulate with it. You don't need to understand how the hammer is built.


I work with some very good programmers who don't worry about code. Past a certain point it bites. You have to be thinking about scalability early so someone else doesn't have to refactor someone else's pride and joy because it's become full of overly concrete logic.


Like a lot of things Linus, it's very pretentious and aloof but right at the core of it. Code matters a lot and bad code can tank performance, stop evolution and introduce security issues. But with Git, this is a mostly truthful statement.


I remember reading a quote where he states, that when looking at new code he starts with data structures, to get an understanding of what's going on. Or something to that effect.

That would be more applicable here. But I couldn't immediately find it, so I pasted this one instead, which is somewhat close but not perfectly related to the OP.


Git is definitely not simple. It's simple if you have a solid understanding of data structures (trees, graphs), and know the concept of a pointer. Not everyone has that background. The concepts are learnable, but the commands have complex behavior that often require reference to use properly. The commands aren't simple by any measure because of all of the edge cases that exist.


it's almost as if git is a tool that was designed for software engineers and not accountants


We do have regular humans using git internally - godsend for remote work. They manage fairly well because they don't really need to anything complicated.

There are some funny neologisms like "check it out on the git" since they don't know what git actually is Vs how to use it but still.


Complexity should not be excused solely because the target audience is “smart” people.


This is the way I understand git. For me it's dead simple. I've tried to teach others over the years. Not a single person has got it so far.


So what's your conclusion from this failure? That everyone else is stupid? You're a bad teacher? Or it's not actually simple and the above explanation includes a ton of implicit understanding of the subtle interactions of the various moving parts?


For the majority of programmers I think it's lack of experience with data structures of any kind. C programmers have to understand pointers and most (I assume) would have implemented at the very least their own linked list at some point and maybe even a tree. But there are so many programmers who simply lack this experience so talk of pointers, links, graphs etc. is unfamiliar.

Then there are those whom I'm sure should have the necessary experience (because they are C programmers, for example), but still don't seem to get it. These people I think just don't care. They don't care about version control and therefore it's irrelevant what git is trying to represent. They just want to get their code merged.


That reminds me about the old joke about monads: At the moment you finally understand them, you lose the ability to explain them.

Seriously, I too find the basic concepts of git quite simple. But whenever I want to do anything slightly out of the ordinary, I find myself wasting a lot of time searching the docs. In fact, I find the naming of commands and their options almost the opposite of intuitive, given my understanding of the basic model.


I can't use the command line at all. It's horrendous and makes no sense. I use magit for everything if I can. If I can't then, like you, I have to spend ages searching the docs.


The way I tried to understand Git at first was like Subversion. Horrible. I almost deleted everything my team worked on for weeks.

Then I read "git inside out" [1] (not to be confused by "git from the bottom up" which I think is not as good), had a "aha!" moment, my view changed and everything became clear and easy. Transformation from graph to graph is something I do every day, so why not in Git?

[1] https://www.slideshare.net/MichaelNadel/git-inside-out-57904...


More directed acyclic graph (DAG), which I suppose is still a type of graph. That said I'm not sure if "graph" is conceptually better than "tree with cross connections". People that struggle with git may not have a good enough grasp on the differences between these structures that insisting on using the "proper" names is immediately helpful.



> The problem with how people use Git, I’m suggesting, is that their analogical or metaphorical conception of Git doesn’t work — it doesn’t fit the way Git actually behaves — if, indeed, the conception exists at all.

No, the problem is not with "how people use Git". The problem is with git. We've known for years how to make clear, concise interfaces that help people understand what's going to happen. Git does not have a clear, concise interface. That is its biggest problem and will continue to be until it is changed to have a clear, concise interface.


Fwiw, I'm not sure if it's recent or not but git CLI has pretty good suggestion of what commands to run next. git status gives a decent amount of info. It also suggests the newer commands like switch or restore.


then feel free to build that tool on top of git. nothing is stopping you, given the existence of such tools today.


git has a very clear, concise and stable interface if you understand how git works. it's designed this way, intentionally. people should stop complaining about it and either learn how to use it, switch to another tool or just write their own interface


It's not possible to switch to another tool unless you only ever work on code that you wrote, and never need to collaborate with anyone else.


No it's totally possible to use your own VCS and sync changes to another VCS. I don't know which tool you'd prefer but I used a git repository for my own work and synched it to a TFVC repository until my company switched to git. It sucked, because TFVC sucks but it's totally doable


no, its absolutely possible, as demonstrated by the numerous repositories that switched from cvs and svn to git. what you describe is a lack of desire to switch to another tool by your coworkers because you've been unable to make an adequate case for doing so.


Complaining about git is like complaining about any other unix tool, if you don't read the docs you're in for trouble. Sometimes a lot of it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: