Hacker News new | past | comments | ask | show | jobs | submit login
The Elements of Git (cuddly-octo-palm-tree.com)
139 points by todsacerdoti on Sept 19, 2021 | hide | past | favorite | 39 comments



I got pretty fed up with learning got from the internet when I had too (about 8 years ago). Every resource I could find was written in a way that only people that already knew could really understand.

Now, that I finally feel good enough, that I can rely on what I know and finally in position to mentor others, I made what I feel was missing, a set of sandbox git repository challenges, with only real life examples [1] I met in professional repos.

I hope the dark age of git education passes soon (and is already fading).

[1]: https://github.com/operatorequals/git-course


I believe this is generally true, but git happens to sit in a nexus where it is particularly true: You can't hand someone the solution to a problem they've never had.

I have a git training I developed for work that focuses not on telling everyone what the solutions to their problems are, but walking them into those problems, getting into detached head, getting into branching and rebasing problems, and then explaining how to get back out of them. I've gotten generally positive feedback from this.

If nothing else, at the very least it convinces them that when they see the "detached head" message, they can come get help and I'll be very sympathetic to what is going on, because I told them up front that I expect this sort of thing to happen and it doesn't mean they're bad people for getting into this state, and we can at least have a sane conversation about what they did to get there and what they want to happen.

It's probably particularly hard to try to tell people the solutions to problems they've never had when they also don't even know what's going on around them and what a problem even is.


This is exactly what I felt was missing as well. Git is the kind of "skill" you can only master on the job, doing lots of case studies, not just reading documentation and tutorials.

There are lots of other programs like that, GDB for example, each with it's own challenge of sythesizing case studies. I hope this kind of learning continues, not only with git, but across other programs as well.


This is a symptom of the extremely poor design of its user interface (conceptually, I mean, not how it looks), notwithstanding its origins in Linux royalty.


Compared to what came before it, I really don't see how the interface is any less intuitive or documented compared to svn or cvs. For example, see the sections for the checkout command for svn[1] and cvs[2]. And then there's the man page[3] for the git-checkout command. The git documentation is a bit more detailed.

[1] https://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.checkout.h...

[2] https://linux.die.net/man/1/cvs

[3] https://git-scm.com/docs/git-checkout


Git checkout is overloaded with too many behaviors. If it was naturally intuitive they wouldn't have needed to add the restore command. A better comparison would be Mercurial which has a much more coherent command interface.


Given the widespread use of git now, it's probably the first VCS that people are exposed to. But, for several years after it was released, it was the first distributed VCS that people used (if they hadn't had a chance to work with other DVSs like mercurial or bazaar). So, it's likely most people at that time were used to using svn or cvs.

As for overloading the checkout command, the git man page lists the following

1. checkout a branch

2. checkout and create a new branch (combining git checkout and git branch)

3. checkout a path at a certain commit

In contrast, the svn co/checkout command has the following options

1. Checkout a directory

2. Checkout a file

3. Checkout a directory or file at a certain revision

Which really isn't that different. Given svn's concept of branches really being copies of the repository, one would switch branches by just checking out a different branch by using the switch command. In this case, I think git's interface is better since you don't need a different command to just change between branches.


I'm speaking about the concepts that underly the interface, not the details of options and so on. svn was orders of magnitude better: you checked out a repo, you made local changes, you got to see the changes you made before committing, you committed changes, unless there were conflicting changes made in the meantime in which case you had to assert that you'd resolved those conflicts, possibly with help from merge tooling.

That's it.

Now, I admit that much of the conceptual complexity with git is to do with it being a decentralized system. svn was not. However, I think it does a tremendously bad job of abstracting and conceptualizing that complexity.


> svn was orders of magnitude better: you checked out a repo, you made local changes, you got to see the changes you made before committing, you committed changes,

The only extra step that git introduced was pushing the changes up to the remote.

- svn checkout -> git clone

- make changes (same for both)

- see changes (svn diff -> git diff)

> unless there were conflicting changes made in the meantime in which case you had to assert that you'd resolved those conflicts, possibly with help from merge tooling.

Which is what you would have to do with git as well if upstream was updated and you pulled it down to your working copy with git.

> However, I think it does a tremendously bad job of abstracting and conceptualizing that complexity.

I think part of the problem is that people want to make getting changes from the repository and sending changes to the repository a single step, whereas git only did that for getting changes with the git pull command. There's no corresponding git push command that handled committing as well as pushing the changes to the remote.

What should have been done was to never give that option. That is, the git pull command should not have been a thing and people would have to learn that there are two steps. One is the network operation of getting the changes or sending the changes and the other is to apply those changes to the working copy or staging those changes in the working copy before sending them over the network.


The secret is that git is, first and foremost, an on-disk data structure (hereafter "git repo"). It's very well-defined and documented, which means that anyone can write a new tool that processes a git repo.

Go for it!


No.

We had better solutions, but the combination of "FREE!" and "network effects" meant that the GitHub virus won and the Git host came along for the ride.

Even Linus hates the way GitHub does things, but he can't do anything about it at this point except complain.


This is not a compelling argument against GitHub (and, as others note, it doesn't really have anything to do with the article, nor does it really add anything to the discussion.)

Why does Linus hate "the way GitHub does things"? Are you referring to this [1] (which, I'll note, is ambivalent on the value of GitHub as a whole, and is only negative specifically on the matter of GitHub merges?) Or is there some other argument here that we're all supposed to be familiar with?

What are these "better solutions"? Where are they? Where can I find and use them? Are they, in fact, better thought-out and more usable than GitHub? How can I test your assertions here?

[1] https://lore.kernel.org/lkml/CAHk-=wjbtip559HcMG9VQLGPmkurh5...


We had Monotone, Darcs and others. We still have Mercurial.

They all coexisted and were being developed. Even Github wasn't a big deal in the beginning and was just one of many providers who were all bootstrapped.

However, once GitHub hit the field with $100 million in VC cash it squashed further development cold--bootstrapping and quality doesn't matter much against someone burning VC cash against you.

Distributed source control is now in a path dependent local minimum and we are stuck with git because we are stuck with GitHub--better is no longer sufficient to break the logjam.

Until Microsoft turns the screws enough this will all remain status quo.


The discussion is about the 'git' command itself, not about Github.


Except that we are talking about Github because something better than git can't exist because it also has to displace Github.


What's to displace; who "needs" GitHub, and why? Just stop using it and use git (or whatever you want to use in stead).


Shouldn’t it be feasible to build an interoperable version control system so you can still collaborate on GitHub?


Good idea, but I've already filled all available slots for working on projects with no upside for me or my company.


I'm a huge fan of git education. From a computer science standpoint, git is very clever but not particularly hard to understand. From a usage standpoint, although it's gotten better over the years, I still wouldn't call it user-friendly. I'd like to see more competing implementations out there. I'd like to see better git libraries for application support. I'd like to see libraries that support alternatives to flat file repo representations.


I kind of view git like vim, awk, or Emacs. It has a complex set of features, and it takes time and practice to learn to use them sufficiently well.

The quality of the introductory documentation has gotten better over the years, thank goodness. I know a great many developers who are terrified of git, and stick to the couple of commands they get taught by the team lead.


I must I found this, personally, to be the most clear and concise explanation of git's internals than the tens of other explanations I've read the last couple of years. Kudos to whoever wrote this.


I think all sophisticated frameworks need a document like this. What are the basic components of the system, what purpose do they serve, and how do they interact?


What I don't understand, is why we need so many articles explaining git and its elements? It seems like these posts are very popular and occur frequently on HN. Either people love revising their knowledge on git, or as is my suspicion, git is a tricky and unintuitive piece of software. I'm a bit bewildered that there have been no so far successful efforts to condense/revise git into a more developer friendly experience from the outset (and that doesn't include writing more 'secrets of git' posts). Just an observation.


For anyone new to git, or puzzled by its fundamental structure, I recommend starting with this:

https://medium.com/girl-writes-code/git-is-a-directed-acycli...

While reading the Medium post, it may be helpful to bear in mind that arcs shown in the drawings — called “pointers” in the text — are exactly hashes to git.


My favourite video explaining Git: https://missing.csail.mit.edu/2020/version-control/

Explains nearly everything you need concisely and effectively, in about 20 minutes (the rest of it is primarily demonstration).


Understanding how Git internals can actually be useful as more than just to build intuition or as an academic exercise!

For example, where I work (https://graphite.dev) we use some of these plumbing commands to extend git to store additional information about pull requests. My teammate did a writeup of this here: https://dev.to/foster/how-to-use-native-git-as-a-key-value-s...


I was lucky enough to learn git from the "git for ages 4 and up" from Michael Schwern:

https://youtu.be/1ffBJ4sVUb4


this is my favorite too! brilliant


I have a basic question: if git forwards its first argument to git-$arg, then how would you produce a list of all possible git commands?

Is there something like ls, but for possible programs on your PATH? Or is there a way to just get a dump of all executables on the current PATH?

Even still, it seems like you’d have to decide which of those programs to show first in the documentation. And to get a short description of each of them, you’d have to invoke them.

I guess the solution is “use manpages,” but it’s kind of unsatisfying — I’d like to build something like git, but using CLI --help.

EDIT: Seems ... strange.

- https://github.com/git/git/blob/master/help.c#L260-L294

- https://github.com/git/git/blob/master/command-list.txt

- https://github.com/git/git/blob/master/generate-cmdlist.sh

I'm not sure `git help` supports the concept of adding third-party `git` commands as first-class citizens, but I'm still reading.


"git help -a" specifically performs that same discovery process and lists what it finds (including aliases!), so one need not reinvent that wheel


It seems like `git help -a` doesn't allow third-party commands to have docstrings at all. It seems to call getenv("PATH"), then calls readdir on each folder, filters out any non-executables that aren't prefixed with "git-", and then lists those under the "External commands" category with no further help.

I was hoping there was a way to attach a short help descriptor to each of the third-party commands. But the help descriptors seem to be hardcoded for the builtin commands, with no support for third party commands. Darn.


Kind of scary to execute all git- prefixed commands when you run ‘git help’. Maybe have a location where git stores descriptions for external commands?


A couple of years ago I saw a video where Linus Thorvalds explains his creation, git, especially how everything is based on the SHA1 hash. It was very clear and understandable (at least to me), but I have not been able to find it again. Does anyone know which video I'm referring to?


Maybe this one?

Tech Talk: Linus Torvalds on git (2007)

"Linus Torvalds visits Google to share his thoughts on git, the source control management system he created two years ago."

https://www.youtube.com/watch?v=4XpnKHJAok8


Thank you, but I don’t think this was it. The one I remember was more technical. However, I did discover Gilfoyle at 21:40 ;).


well you have to give credit to something so well-written and clear. I have read and listened to git knowledge like the rest here, and I have favorites. This exposition shows the actual files on disk, in trees, in a way I had never seen. The opening paragraphs emphasizing that the data structures are stable, while the tools are interoperable, may be written in other languages, and rely on the stable data structures, is also said here in a way I had never seen before.

However, this is not the tutorial I would send anyone towards, as a user of git. The programming design logic is not enough to get better at the workflows. Mercifully, branching patterns in team projects are omitted entirely.

enjoyable and glad to read this for a second or third layer to understanding git


I like to use GitGraph for vscode to have a better visual on everything going on in my repos


Git is a circle jerk for programmers who would rather talk about work than actually do work.


I don’t follow. I use git for real work all the time.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: