Git for Computer Scientists (2010)

umvi · on June 13, 2021

I just think of git as a graph and branches/tags as text pointers to nodes in the graph. Doesn't seem that complex to me...

Maybe I "got gud" though and can no longer empathize with git beginners

CountSessine · on June 13, 2021

The git plumbing and plumbing commands are straightforward and easy enough to understand once you read about them a bit (I recommend the free Pro Git book online).

The original git porcelain commands - git branch, git reset, git pull - are execrable. They’re filled with implementation details (index/cache vs staging), weird and suggestive syntax that seems like it should be extensible and widely applicable but isn’t (localbranch:remotebranch), and nuclear-powered self-destruct functionality hidden amongst playthings (git reset vs git reset —hard).

taberiand · on June 13, 2021

It sounds like git in general isn't necessarily the problem (at least after getting the basic model down), it's specifically the interface and associated foot-guns it sticks in there for beginners (and tired experts) to trip over.

Most people most of the time will get by if they grab a decent git GUI, figure out the minimal set of operations they need, and just Google the rest when necessary.

My stupidest git mistake was when I was cleaning out a directory of bin and obj folders and included gits obj folder as well. And of course after crying in the corner for a bit I take a little time to look into git commands and I could have just run 'git clean'

pjc50 · on June 13, 2021

Two additional bad defaults: crlf handling on Windows, and pull not defaulting to rebase.

The message "up to date with origin/master" is also misleading, because it doesn't check the remote itself.

nerdponx · on June 13, 2021

Pull defaulting to rebase could be a dangerous and chaotic default. If you want to argue that pulling should be fast-forward-only, then I'd say maybe you have a case.

yencabulator · on June 13, 2021

I have aliases ff for fast-forward-only merges and puff for fast-forward-only pulls. I never type `git pull` anymore, and it's much harder to shoot your foot off with the aliases.

sleepychu · on June 13, 2021

`git checkout` great for switching to a different commit and for throwing away local unstaged changes!

nerdponx · on June 13, 2021

Using `git checkout` with a filename really really really should have a yes/no prompt by default, or at least an `-i` option like `rm`.

sleepychu · on June 14, 2021

I think `git switch` is an attempt to resolve this

Izkata · on June 13, 2021

Yeah, something along these lines is how I've explained it to co-workers as well. Tossing in "git log --all --oneline --decorate --graph" so they can actually see the graph also helps a lot.

usr1106 · on June 14, 2021

    gitk --all

is even better. Most people don't work in front of VT100s today.

Actually I work most of the day remotely in `screen` on my office machine. But in case of complicated/confusing/buggy history/branching I fetch for the repo so I can run gitk locally.

rubyist5eva · on June 13, 2021

Because that's exactly what it is, and it's not. When I explain git to newbies in this way, it's like something clicks in their brain and they just start to "get it" as well.

Spivak · on June 13, 2021

But then once you get the mental model you spend the other 90% of the time figuring out what magic incantations are needed to transform the graph in the way you want.

crispyambulance · on June 13, 2021

The sad thing is you can't just "figure it out." Most folks do "good-enough" after some coaching and memorizing a limited set of commands needed for their workflow-- until something unexpected happens.

It could be a typo, or trying something new, or forgetting/misunderstanding the intent of some counterintuitive command, or maybe cleaning up an existing problem. All those things can easily put someone in a deep rabbit hole of inside git book or, worse, google search.

nkozyra · on June 13, 2021

Well it's understandable. Moving nodes in a graph (a tree, really) has a lot of side effects. Couple that with multiple people trying to keep things in sync(ish) and it gets super complex.

rubyist5eva · on June 14, 2021

90% of the time you can achieve what you want with `git rebase -i`

lanstin · on June 13, 2021

Especially if they took a maths course on graph theory.

Igelau · on June 13, 2021

It could be hard if you never took Discrete or another course that introduces graph theory. Or if you cheated your way through or barely scraped by. I can see how a CS freshman or someone from another field might struggle, but even then it's more comprehensible than any of the alternatives.

crispyambulance · on June 13, 2021

The hard part of git has never been the understanding its graph model.

The hard part HAS ALWAYS BEEN is memorizing all those badly named and counterintuitive commands.

Igelau · on June 14, 2021

Even that is only true if you're coming from a system where branch, tag, and checkout mean something else.

IshKebab · on June 14, 2021

I think the hard part of Git isn't the bit you've just described. It's the terrible CLI and terminology.

Give any Git newbie a decent GUI and a translation for Git terminology into sane terminology and they will have no problems.

actually_a_dog · on June 14, 2021

I think that's a fine perspective for a computer scientist or graph theorist, myself. Fortunately, since the article title is "Git for Computer Scientists," that's essentially the approach the article takes. :-)

jollybean · on June 13, 2021

The 'complex' part usually relates to how to manage the graph in terms of what you want to do, and all the odd states that might exist otherwise, especially when syncing with 'other graphs'.

nerdponx · on June 13, 2021

That's pretty much the gist of the article, no?

cerved · on June 13, 2021

That's what makes it good :)

I had no idea why it took my brain so long to wrap my head around it. Maybe it was blissful ignorance, never sitting down and making a mental model of it. Always looking for a tldr version of doing things. I don't know exactly at which point things clicked but it went from bewildering to just makes sense

cerved · on June 13, 2021

I think all that is needed is an aha moment

RobRivera · on June 13, 2021

i honestly feel people are allergic to rtfm

auggierose · on June 13, 2021

A friend of mine just posted this today, and I totally agree:

https://weisser-zwerg.dev/posts/software-engineering-vcs/

IshKebab · on June 14, 2021

> in my opinion, the majority of projects developed in-house in an organization by a dedicated in-house software engineering team, would be better off following the guiding principles in Why Google Stores Billions of Lines of Code in a Single Repository and rather using something like SVN rather than git.

Mmm no thanks! In any case there's no reason you can't use Git itself as a monorepo! You don't have to inflict SVN on people for that.

Very weird opinion.

auggierose · on June 15, 2021

Well, the one reason would be to avoid having to deal with Git-Apostels who insist on using git "the right way" instead of how you tell them to. If they cannot learn the 3 or 4 ways of calling svn, for sure they cannot use git the way you want them to.

rapjr9 · on June 14, 2021

I worked on computer science projects for 20 years. At first we had no source management, everyone did whatever they wanted. Then we used CMS, and we occasionally stepped on each others toes but things were better. Then we switched to SVN and nothing much changed except we established a way to hold locks on files while they were being changed. Then we switched to git because the students wanted to learn the cool thing. We started having meetings to teach people git. Meetings about the best way to use git. Meetings to deal with common problems in our use of git. Productivity dropped because everyone now had to deal with git problems. It made my life hell because usually I just wanted to check in my code so it would not get lost, but instead I would continually get forced into reconciling git issues. I would have to resolve issues to get my code checked in because other people had changed something entirely unrelated to my work. I stopped using git and kept my own backups and only checked in code very occasionally so I could get work done. I noticed other people doing the same thing.

The main problem I had with using git is it did not match the way we worked. Git assumes there is one person who is the gatekeeper, who decides what gets into the source, and who does some integration and testing. In research there usually is no one in charge of that, instead everyone is responsible for their own code, testing it, and integrating it. The git model was wrong for us, we never used pull requests at all, because there was no one person who understood everything well enough to approve them. Students don't have the experience or time to be the integrator, the profs don't even write code, and I had multiple other things that I had to do. So using git made a mess of what had been a simple process previously. Git was designed by Linus to make his life easier in managing changes to a kernel. It does not work well in other scenarios and should not be used in many circumstances. Yes, you can make it work, but at a cost.

andi999 · on June 13, 2021

Will read it, until now, for me,git feels like it tries to get in my way (probably because I think differently). I heard about fossil,(https://fossil-scm.org/home/doc/trunk/www/fossil-v-git.wiki) does anybody have experience with it? Does it suck less?

booleandilemma · on June 13, 2021

I think there are 2 core principles governing fossil:

1) It wants to be the only tool you need to bring with you if you and your fellow developers are going to be stuck on a desert island. It’s not only version control, but also an issue tracker, and more recently a chat app.

2) It preserves everything you commit to it. Whereas git lets you polish your commit history before pushing, fossil keeps everything. You can’t alter your local history. All your messy scratch work can’t be cleaned up. It’s visible forever.

That second point is what turns me off to it, so I’ve stuck with git for personal projects. When I push up my local code, I like to have a very clean history.

andi999 · on June 13, 2021

Thanks. They really seem to have a bias against deleting (and not so good rationalizations):https://fossil-scm.org/home/doc/trunk/www/shunning.wiki

Apart from that, did you try it, and was it smooth?

booleandilemma · on June 13, 2021

I tried it just for an afternoon, but I did find it easy to work with.

I find git easy to work with too though, as long as everyone sticks to a well-defined workflow and doesn’t do anything weird.

andi999 · on June 14, 2021

Maybe that is the problem here, we do not have a defined workflow. Do you know of any good workflow sheet for git we can lift ideas from?

stingraycharles · on June 13, 2021

So if I understand correctly there’s no equivalent to git squash merging branches?

hyperman1 · on June 13, 2021

My experience with git is it's organically grown, and regularly a mess. It works reasonably well and in fact is better than a lot of alternatives, but can become a monster quickly and unexpectedly. My experience with mercurial was better than with git.

But none of this matters, as git/github/gitlab is today the industry standard, or even the category killer for version control. You will have to deal with it in one capacity or another. So my advice is to deal with git, learn at least up to medium level. As industry standards go, things could be a lot worse than git.

lanstin · on June 13, 2021

The main thing to know for newbies is as long as you don’t force push to a remote branch, it is safe. You are creating new state only, not destroying. All errors are Recoverable.

usr1106 · on June 14, 2021

Even force pushing is not really a problem. If you don't want to keep every typo and braindead approach in history, force push is a required tool.

Naturally things go wrong occasionally, but garbage collection is not run often. You only need to know the SHA-1 and you can fetch "lost" commits again.

Old SHA-1s can be found in reflog. We also have all pushes automatically announced all in chat, so you can look up previously pushed SHA-1s in chat history (we use gitlab and zulip and those support it out of the box).

Of course you still need a mental model how git history (including history rewriting) works, othwerwise you cannot understand what exactly went wrong. And without knowing what went wrong fixing it gets awkward trial and error, which unfortunately many less experienced git users seem to do.

lanstin · on June 13, 2021

And don’t expect the command details to make sense. What you want to do is some simple thing in terms of a graph of states, and just google the command if you aren’t sure it it is origin/branch or origin space branch.

pjc50 · on June 13, 2021

You can still easily lose uncommitted local state, which is unrecoverable, and also put the checkout into a state from which a newbie finds it hard to escape.

jjice · on June 13, 2021

Definitely helped a bit. I just graduated from university and am working full time as a developer now. I thought I knew how to use git because I knew how to do feature branching and merging. Boy was I wrong. Within a few weeks at my new job, I've realized that I'm missing so much useful git knowledge. When I learned about cherry-pick, my mind was blown.

My goal right now is to develop a better mental model of git than what I have right now. If anyone has recommendations for resources, please let me know!

guhidalg · on June 13, 2021

https://learngitbranching.js.org/

Go through every lesson, understand it, and find yourself more knowledgeable about git usage than 95% of developers.

lanstin · on June 13, 2021

So true and so worth the extra knowledge to understand your tools. You should also read about the various knobs on your compiler or interpreter from time to time. I used to reread gcc manual every five years, and now I search on the env variables that affect python runtime. Getting ready to that for go build chain now I have 3 or 4 production go things. Similar for my editor and libc and language stblib and kernel APIs, tho they are more diffuse than the gcc manual.

Forge36 · on June 13, 2021

This was a good way to pass some time :)

I look forward to sharing this with others on my team.

jlokier · on June 13, 2021

This was the best resource for me:

http://ftp.newartisans.com/pub/git.from.bottom.up.pdf

leephillips · on June 13, 2021

That is a good, clear exposition. Thanks!

macintux · on June 13, 2021

Jessica Kerr (jessitron) gives a good git internals talk that you can find on YouTube if that’s a helpful learning style.

dang · on June 13, 2021

Some past threads:

Git for Computer Scientists - https://news.ycombinator.com/item?id=3146466 - Oct 2011 (15 comments)

Git for Computer Scientists - https://news.ycombinator.com/item?id=1485612 - July 2010 (17 comments)

JJMcJ · on June 13, 2021

As always: https://xkcd.com/1597/

lanstin · on June 13, 2021

This is never not funny.

LAC-Tech · on June 13, 2021

I'm so sick of git.

Yes I know what a directed acyclic graph is. No I don't know what 'checkout' will actually do this time when I run the command.

It's been 10 years. It's still confusing people. There's still article after article, book after book written on a tool that should be getting out of a programmers way.

Let's try something else.

belter · on June 13, 2021

For the more knowledgeable on Git. What is the current status of the transition from SHA-1 ?

beermonster · on June 13, 2021

This [1] is useful to read. Sha256 supported (experimentally at least) in Git since 2.29[2]

[1] https://lwn.net/Articles/811068

[2] https://lore.kernel.org/lkml/xmqqy2k2t77l.fsf@gitster.c.goog...

cerved · on June 13, 2021

https://thenewstack.io/git-transitioning-away-from-the-aging...

throw0101a · on June 13, 2021

I ran across this little gem recently:

> Git gets easier once you get the basic idea that branches are homeomorphic endofunctors mapping submanifolds of a Hilbert space.

* Isaac Wolkerstorfer, https://twitter.com/agnoster/status/44636629423497217

question000 · on June 13, 2021

Richard Feynman had a joke theory that any purely theoretical mathematical concept when expressed in layman's terms devolves into something completely obvious. So does this actually mean something like "git uses branches."?

jeltz · on June 13, 2021

I have never got all these jokes. When my job switched from Subversion to git it took me about one week plus reading a couple of articles to become more productive in git than I ever was in Subversion. Yes, version control is a bit tricky but git is not that hard to understand and was much easier than contemporary Subversion versions.

avalys · on June 13, 2021

Things have gotten a little better. But, try to do something off the beaten path in Git, and you may ultimately get the joke.

For example: “two weeks ago an intern accidentally committed a file containing IP we’re not allowed to use, we need to erase it from the repository and all developer machines.”

Have fun with that one!

EDIT: I mean, try to figure this out from the official Git documentation (https://git-scm.com/docs). No, Stack Overflow and Github are not the official Git documentation. Believe it or not, the idea that "Git is hard to use" predates Stack Overflow.

jeltz · on June 13, 2021

I have had to do it once during the 12 years I have used git, so I seriously doubt that this is why people think git is hard. And I think that googling it would be fine in that case. That said: since I have done it once I could easily figure out how to do it again and it wasn't hard, just a bit cumbersome due to git's distributed nature.

aardshark · on June 13, 2021

Is Googling not allowed? This situation is pretty common, so there are plenty of SO answers and articles on how to accomplish rewriting history to erase it from the repo.

Removing from developer machines is a separate issue and requires you to be able to coordinate your Devs.

If you meant that it's not simple to work out from scratch what you should do without lots of reading and trial and error...that kinda goes for a lot of tools, no?

Spivak · on June 13, 2021

Yes but git seems to be one of those tools where laypeople seems to genuinely not be able to derive how to do complex tasks from first principles. Lord knows I can’t. If your Googling doesn’t turn up someone’s who’s had your exact problem you will have to burn a long time figuring out how to do what you want.

yawaramin · on June 13, 2021

It's all documented https://docs.github.com/en/github/authenticating-to-github/k...

u801e · on June 13, 2021

> For example: “two weeks ago an intern accidentally committed a file containing IP we’re not allowed to use, we need to erase it from the repository and all developer machines.”

Technically, the issue was actually pushing that commit to the remote repository.

I think the best advise one can give people when using it is to to run:

  git log -p origin/master..HEAD

and look at the commit messages and associated diffs to see if there's anything there that shouldn't be there before the actually run git push.

LAC-Tech · on June 13, 2021

> git log -p origin/master..HEAD

See THIS is the problem. Ugly, inconsistent, clumsy use of the english language, and confusing.

This will go on my git sheet, with a comment as to what it actually does because I don't have the time to actually unpack that from first principles. I've got better things to do than become an expert on needlessly complicated software.

u801e · on June 13, 2021

> See THIS is the problem. Ugly, inconsistent, clumsy use of the english language, and confusing.

It's a command line interface, not plain English. What's ugly and inconsistent about the git log command as was quoted in your reply?

> I've got better things to do than become an expert on needlessly complicated software.

As a software developer, I have to read through a lot of documentation to be able to use programming languages, SQL, data stores, unix utils, etc. I don't see why it would be any different for a VCS.

I think the actual issue is that people aren't willing to read through the documentation to understand what a command does and what options are available.

As for the command itself, the -p switch shows the diff associated with each commit shown with the git-log command. origin/master represents the upstream tracking branch of the master branch (most likely the base branch that the person is working on). .. represents a range operator and HEAD repesents the commit that's the latest commit on the branch on the local machine.

LAC-Tech · on June 14, 2021

I use plenty of command line interfaces everyday. Most of them are pleasant, predictable, and easy to remember. None of them consistently confuse me like git does. (How many different things does 'checkout' do?)

SQL is not only much more intuitive than git, it gives me amazing leverage to deliver value to clients. By comparison, git wastes my time. There's zero or minimal competitive advantage to using it over any other VCS.

onei · on June 13, 2021

Erase from the repo, a little non-standard, but fine. Being asked to remove it from all developer machines sounds like someone misunderstood how version control works. Was that a real life example you hit?

karatinversion · on June 13, 2021

They might have a model of version control in their head that predates distributed version control systems - I never used one myself, but the code base I work on still has scars here and there from the era when only one developer could have any single file checked out.

pjc50 · on June 13, 2021

Not a misunderstanding, a requirement. If the developers cannot have that data (legal reasons? Secrets?) it must be deleted.

Probably has to be done outside git, though. Maybe one of the corporate virus scanners will let you definite a local signature.

yawaramin · on June 13, 2021

It's rather simple: remove it from the origin repo using BFG Cleaner or whatever, then ask devs to delete and re-clone the repo. Not everything needs a complex technical solution.

avalys · on June 13, 2021

Git clones the entire remote repository to each developer's machine. So, if you accidentally committed something you shouldn't have two weeks ago, everyone will now have a copy in their local repo. And you can't always just tell people to delete their local repos and start again, since they might have local branches they're working on, etc.

edp · on June 13, 2021

I don't think this is even possible with SVN or CVS, is it ?

MarkSweep · on June 13, 2021

At least with SVN, the is one option that is pretty similar to git’s filter-branch: svndumpfilter. You dump the entire history of the SVN repo to a file, edit it, and then load it into a new SVN repo. I used this technique to pre-process a repo to remove large files before migrating to Git. The file format is simple enough that you can easily write a program to edit the stream.

wrycoder · on June 13, 2021

It’s very easy in CVS, which is why some people prefer CVS to any distributed solution.

sigjuice · on June 13, 2021

Curious why the IP address has to be obliterated from history instead of just correcting it in a new commit? An IP does not seem sensitive like a secret or private key.

EDIT: Sorry, my bad. I misread.

LegionMammal978 · on June 14, 2021

Presumably they did not mean an Internet Protocol address, but instead data containing disallowed intellectual property.

ojbyrne · on June 14, 2021

I believe in this case, IP == Intellectual Property.

SkyMarshal · on June 13, 2021

>I have never got all these jokes.

If you mean that literally, this joke is referencing another joke meme in the functional programming community, explained here: https://blog.merovius.de/2018/01/08/monads-are-just-monoids....

If you mean, you don't get why people joke about Git being hard, it probably isn't for most professional programmers already accustomed to some kind of source control, just perhaps to anyone new to it.

jollybean · on June 13, 2021

I often wonder about this as well, but I think there must be a difference of opinion of what 'hard' is.

I think that people think they are confident about Git in 'a few weeks' are basically have a lack of self-awareness. They 'don't know what they don't know'.

I've seen countless times, developers huddled around someone else's computer, trying to figure out what went wrong.

I saw a git animation/visualisation tool once that animated operations, and I saw things happening I had never seen before, basically a lot of 'loose end' things that I didn't even know existed.

I also wonder if that has something to do with the fact that such things are maybe not suited to 'command line' and are inherently structured.

GuB-42 · on June 15, 2021

It is not hard to be more productive in git than in subversion... (ok, I hate subversion). I think a better point of comparison would be mercurial, which is based on the same principles as git but with different opinions.

But I think the trouble with git is that it is very flexible, there are many ways to do things (ex: merge vs rebase) and the command line interface is not particularly intuitive. I messed up with git much more than I messed up with mercurial, but git also makes it easier to fix the mess.

beermonster · on June 13, 2021