Why I Don't Hate Git: Hidden Consistency

jordigh · on Feb 17, 2015

(copied over from lobsters, with links to more material there: https://lobste.rs/s/odj5y1/why_i_don_t_hate_git_hidden_consi... )

Haha, I spurred this debate with mitsuhiko over IRC yesterday. I was arguing with him over git vs hg yesterday, and this blog post is obviously his retort. Here is my defence of Mercurial:

Mercurial’s design aims to welcome people who come from other VCSes. It started being welcoming to CVS and SVN users, and its CLI mimicked those, as well, as a few ideas of Bitkeeper. Git’s initial design was very bitkeeper-like too, such as branching-by-cloning being the only way to branch. Nowadays, Mercurial also makes some concessions to git users.

Despite its various sources of inspiration, Mercurial works hard to keep all of these ideas consistent. Commands rarely grow new options. Many deviations from core functionality are first tested in optional extensions for a long time before going into the core. Lots of time is spent bikeshedding what a command’s name should be, and what language the documentation should use. Consistency is very important. Backwards compatibility is tantamount. The CLI is the UI and the API.

A thing git is often lauded for is the simplicity of its internals, which are frequently deemed to be as simple as to not be internal at all. Despite being a binary format, Mercurial’s revlogs are also approximately simple, which is why people sometimes write parsers in other languages.

But Mercurial is a lot more than just git-with-a-nicer-UI. There are many exciting features in Mercurial, features that I don’t think will ever make it into git because they are just too different from the way git works. Mercurial Evolve really changes the way we collaboratively edit commits. Templates and revsets can be combined to program interesting extensions. New extensions can scale Mercurial into gigantic repos.

And because I think these ideas are so great and must be explored and improved, I will keep using Mercurial, teaching Mercurial, and improving Mercurial

alblue · on Feb 18, 2015

No, Mecurial revlogs are not approximately simple. They were designed to be an append-only log for individual files, in the same way that CVS and RCS performed file-level change repositories. This choice causes problems when doing file renames and history merging because you need to either duplicate data or provide complex pointers between filenames which may not even exist in the tip.

I wrote a post comparing the file layouts a few years ago:

http://alblue.bandlem.com/2011/03/mercurial-and-git-technica...

The TLDR is that Mecurial revlogs were designed from lofty architectural goals (append-only formats, designed to be parsed forwards) whereas Git is just an object soup with pointers to other objects in the same soup. As a result, different file storage mechanisms have been created (direct file, push to remote HTTP/S3, BigTable etc.) and new features (bitmaps, packed archives, delta compression between the same and different filenames) have been grafted on over time. The format is also versioned, with feature versions being added at a later stage to the transport protocols that ship packs of these deltas between versions.

Frankly the only valid criticism of Git appears to be 'The command line flags are a little funky' and given the extensive Git tooling that has been built (it's provided by default in Visual Studio, Eclipse, IntelliJ, Xcode and others) the fact that command line tooling takes a bit of getting used to is really such a non-issue that I'm not going to waste further time talking about it here.

Mercurial is dead, but its fans just haven't noticed yet.

rbehrends · on Feb 18, 2015

There are problems with Git's "object soup" approach, too. For example:

(1) The performance issues with "git blame" appear to be essentially unfixable. That tradeoff is by design [1]. Note that by "performance issues" I mean that git blame can take several minutes on some repositories (e.g. src/xdisp.c in Emacs).

(2) Git is pretty much tied to a 1:1 repository:directory model and cannot safely support a 1:1 branch:directory model with multiple branch checkouts sharing a repository (git-new-workdir as the closest approximation is not safe).

In general, a lot of operations on Git have an O(branch history size) or even O(repo size) complexity; that is not a problem if you do not need them, but it puts limits on what you can do efficiently with Git (at least without adequate caching and the necessary porcelain to use it).

That, of course, on top of the other criticism typically targeted at Git (poorly designed command line interface, difficult to understand internal model [2], possibility of data loss [3], etc.).

> Mercurial is dead, but its fans just haven't noticed yet.

Not everybody here is concerned with the silly Git vs. Mercurial war (the modern version of Emacs vs. Vi). My personal concern is that the result of widespread Git adoption is that VCS development has stalled and has settled for a "good enough" that I don't really consider good enough; Mercurial is interesting not because they're doing things better (though they do some things better, and other things worse), but because they appear to be the one team still actually experimenting with new things (e.g., changeset evolution). I really wish there were more going on with Bazaar (which is basically in maintenance mode now, and Canonical doesn't even seem to put a whole lot of effort into that, with lots of bugs still outstanding) or Fossil (which is mostly trying to be conservative rather than innovative [4]).

In short: Competition is good, monoculture is bad. The desire to have one VCS to "rule them all" worries me.

[1] http://marc.info/?l=git&m=116991865311836

[2] http://people.csail.mit.edu/sperezde/onward13.pdf

[3] http://jenkins-ci.org/content/summary-report-git-repository-...

[4] Which is a worthy goal in itself, but it also means that they aren't really moving VCS development forward.

saidajigumi · on Feb 18, 2015

Your data loss example [3] is precisely not data loss, like virtually every other example I've seen regarding git. The commits and history were definitively not lost, and although the article doesn't say, the repositories affected by this tool should have even still contained the history record of the old head(s), making rollback to the latest commits a fairly straightforward affair.

This is honestly no different than someone misconfiguring a tool for any other VCS and overwriting the repo's head with old junk. In the (other VCS) case, the fix operation is to go into VCS history and resurrect the correct head commits (e.g. via a 'revert' style operation). In git's case, the fix operation is ... go into VCS history and resurrect the correct head(s), traditional revert not being relevant to how git functions in such exceptional cases. It's either ignorance or disingenuousness to call git's behavior here, though different than SVN, hg, etc. as "data loss".

rbehrends · on Feb 18, 2015

First, how do you know that no data was lost? There is no way to even verify that all data was recovered. They are pretty confident, but there's really no guarantee, is there?

More generally, yes, the case you are worried about is typically not when there are lots of repository clones in circulation (though the KDE case [1] shows that data loss is quite possible even then and that replication is no real alternative for proper backups [2]).

And, of course, the reflog will keep commits alive for a while and garbage collection will not occur while the grace period is over.

The situation where this doesn't work so well is personal/small group repositories or branches that only experience intermittent commits and that aren't being mirrored by a large number of users. In that case, user errors can easily translate into data loss when garbage collection finally catches up with you.

[1] http://jefferai.org/2013/03/29/distillation/

[2] A practical example would be where a repository is so large that many contributors prefer to use shallow clones.

laurencerowe · on Feb 18, 2015

Git won, but Mercurial seems to have found a niche in organisations with huge repositories. https://code.facebook.com/posts/218678814984400/scaling-merc...

jgraham · on Feb 17, 2015

FWIW, I think that the tendency to push things into extensions is a huge net negative for hg consistency and learnability.

VCSes are complex tools and it takes a little while for people to understand the data model and how it maps to the command line UI. This means that people new to a tool — or just new to a particular project's workflow — will often have "how do I do X?" type questions. With git helping people is pretty easy; you just sit down with them and go through the set of commands they need to achieve what they want, explaining the operations on the tree along the way (if necessary). It's even the sort of thing that you can do over irc without too much difficulty.

When using hg there is a whole extra level of complexity because the default setup isn't actually suitable for use in real projects; to get something useful you first have to enable a bunch of extensions. So given a random user with unknown configuration it's hard to know what commands are even available to them without diving into a configuration file, possibly downloading some random python scripts from the internet, and so on. For example the Mozilla source tree has a whole mercurial setup script that configures about a dozen extensions, a few of which are providing useful Mozilla-specific functionality (e.g. bugzilla integration), but most of which are plugging holes in the hg featureset.

I also find if baffling that, given git exists, when hg developers integrate ideas from git wholesale — something that is on the whole positive — (e.g. bookmarks as an implementation of local branching), they conspire to do so in such a way that the experience is jarring for people moving from git. To me it seems obvious that if you are coming second in a space with strong network effects, you don't go out of your way to make it painful for people to switch.

I get the impression that at least some mercurial fans expect it to win in the enterprise, citing organizations like Facebook optimising it to work with their giant repositories. But it isn't clear to me that there's much trickle-down effect from there; most smaller projects simply don't have that much source to control, or the ability to enforce the kind of homogenous environment that forgives many of the shortcomings of mercurial's UX.

krupan · on Feb 18, 2015

Basically this says, "git's UI was so bad that it forced me to learn the internals, and once I groked the internals git made a whole lot of sense." That summary might sound like I'm trolling but I don't think that's a bad thing (else, why is git so popular?). I've been the mercurial "expert" on my team at work for the past 5 years and I can't count the number of little DAG diagrams I've drawn trying to get people to understand what was really going on with their repository. I'm pretty sure there are a few people who still don't really understand how mercurial works under the hood. Maybe that's a positive aspect of mercurial...but maybe it's not :-)

Now, I will say that the internals of mercurial that you need to understand to gain enlightenment seem simpler to me than what you have to know for git. Mercurial has commits and commits have parent reference(s) that link commits together into a DAG. Commits might have a branch label. You move commits from one repo to another with push and pull. You create a commit with two parents by doing a merge. And that's it!

Git has those same basic concepts, but you also have to know about the index, and branches (which are really pointers to commits that may or may not be a branch in the DAG), and remote branches, and merges that aren't really merges because they just move the branch (which is a pointer, remember, that's why a "branch" can move) to the head, and all kinds of other interesting (and useful, no doubt) concepts.

scrollaway · on Feb 17, 2015

Good article. That old tutorial is scary :)

> I screwed up really badly before, merging wrong things together, accidentally deleting data and much more, yet I never lost any data or felt left abandoned by my tool.

I can relate for the most part. I can only think of one instance (in over half a decade) where I felt git's shortcomings: there is no way to get a deleted non-gc'd object from a remote to your local repository, even if you try to reference it by its sha1.

This happened to me when some bad changes were force-pushed to a repository on Github and did not have access to a machine which had the latest changes. My repository on Github still knew about the old commits, but they were unreachable by git itself.

phs2501 · on Feb 17, 2015

> I can relate for the most part. I can only think of one instance (in over half a decade) where I felt git's shortcomings: there is no way to get a deleted non-gc'd object from a remote to your local repository, even if you try to reference it by its sha1.

For better or worse (and I've wanted it to work too) it's an intentional security feature that you can only pull objects from a git remote that are reachable by its refs; that way deleted branches (e.g. containing data that wasn't intended for release) are instantly unavailable rather than needing to wait for GC.

scrollaway · on Feb 17, 2015

Seems like a shaky justification. I understand not offering things that are up for deletion but there wasn't even a way to do git pull --i-really-want-everything or some such.

If you push passwords and keys to your git server, then force-push those things out, you most definitey want to run a gc. Git is a flimsy security layer around this.

avar · on Feb 17, 2015

Of course you can't run "git pull --i-really-want-everything", you're the remote attacker this feature is meant to protect against!

The use-case for this is that you're pushing to some shared hosting like GitHub where you can overwrite and delete refs, but you can't force a gc.

You don't want someone to scour your Git commit announcements and see "oops, deleted password!" and go and fetch the deleted SHA1.

phs2501 · on Feb 17, 2015

Well if you don't have enough access to the remote machine to locally get the objects or run "git branch oops <sha1>", you probably also don't have access to run a gc to prune the objects. In that case the "permissive" alternative would mean that you could not remove access to the objects at all once they'd been pushed. Given that, I can see the justification for the behavior they chose.

ecobiker · on Feb 17, 2015

It is possible with Github at least - using a combination of the list repo events API & the create commit API.

drrotmos · on Feb 18, 2015

Couldn't you just have (by hand) created a tree/commit/branch referencing the object, pushed it to the server and then fetched it (possibly to a new clone)?

jordigh · on Feb 17, 2015

> That old tutorial is scary :)

And funny!

    Creating a new git archive couldn't be easier

O RLY!

fsk · on Feb 18, 2015

Three words against git: Detached Head State

I view git as a sort of shibboleth.

You can't really understand how git works unless you understand trees as a data structure. That excludes all but the hardcore types.

Some designers and CSS experts need to use source control, but Git is too complicated for them.

Once you get a detached head state or corrupted repo, then you need a git expert to clean things up. I once committed while in a detached head state, and so git ate my changes and I had to reflog to recover them. That is just insulting.

At my job, I work with some designers now, and they always leave the test server in a detached head state.

But when I switched from the GitHub client (yuck) to the SourceTree client, most of my concerns went away.

allemagne · on Feb 18, 2015

>You can't really understand how git works unless you understand trees as a data structure. That excludes all but the hardcore types.

DAGs and BSTs are taught in second year undergraduate classes where I'm at. What's hardcore about them? Serious question.

to3m · on Feb 18, 2015

Non-programmers tend not to have taken undergraduate computer science classes!

fsk · on Feb 18, 2015

Yes, that was my point. There are people who don't have a CS degree, that should use source control. For example, designers should use source control for their work. Explaining git to them is not feasible.

allemagne · on Feb 18, 2015

My bad, I was thinking from a CS centric POV

tekacs · on Feb 18, 2015

> unless you understand trees as a data structure

Well unless you understand DAGs, more like? Which is somewhat more unhelpful. :P

(given that merge commits have multiple parents)

akkartik · on Feb 17, 2015

Do others have examples of software "where the way it works is a crucial part of the user experience"? The one that comes to my mind is lisp macros; if you mess with them for a month or two you can't help but have a pretty good understanding of how they work. Clean internals can colonize your brain in a way that merely clean interfaces can never compete with.

pepve · on Feb 17, 2015

I like Redis as an example in this category.

I also think this ties in with the law of leaky abstractions, in that a good understanding of that law will make ui designers choose lesser/thinner abstractions over bigger/leaky ones.

melloclello · on Feb 17, 2015

This article begs the question - what might a sane/consistent UI on top of git look like?

rbehrends · on Feb 18, 2015

Last year, I started work on an attempt to basically create a Bzr-like [1] porcelain. I still think this would be the way to go for a serious attempt to provide a consistent, but reasonably powerful UI for Git (especially given that Bzr and Git actually have a fair amount of commonalities under the hood).

I recently gave up for a number of reasons, though:

(1) It turned out to be hugely more complex than I had initially thought.

(2) There would have been the need to store additional meta-data in the Git repository. Combined with the need to intercept a lot of basic Git commands, that would have largely relegated Git to a transactional key/value store, which would have destroyed much of the purpose (interoperability with the Git eco system).

(3) Mercurial 3.2 introduced to bookmark management that emulated the behavior of Git branches better, which (in conjunction with hg-git) made it a viable frontend for Git, and it was in the end easier to fix my remaining issues with Mercurial via extensions than to fix my issues with Git via porcelain.

That said, I think the approach is still viable and in principle a good one, especially if you give up on some Bzr's features that yield relatively little for the amount of work involved (such as putting empty directories under version control).

[1] Bzr suffers primarily from abandonment issues, its basic design and UI (modulo some needed clean-up) is pretty good, especially the model it exposes to users.

cgag · on Feb 17, 2015

See this paper and "gitless": http://people.csail.mit.edu/sperezde/onward13.pdf

http://gitless.com/

nicolethenerd · on Feb 17, 2015

http://www.sourcetreeapp.com/ ?

therealdrag0 · on Feb 17, 2015

Yep. SourceTree with BeyondCompare hooked up works rather well, been using it for a couple years now.

jammycakes · on Feb 17, 2015

I was thinking a bit about this the other day. I think that it would probably be geared around the way that Git is actually used.

The thing about DVCS is that it's allowed a whole lot of experimentation around different workflows, and two in particular have come to the forefront in particular: pull requests and git-flow. A porcelain designed around these particular workflows would be quite effective IMO.

Also, I think it would have some better, less jargonistic terminology. For example, something like TFS's "get-latest" as an alias for "git pull origin master" would be a lot clearer.

jordigh · on Feb 17, 2015

hgit seems like such a proof of concept: a mercurial UI over a git store:

https://bitbucket.org/durin42/hgit

tootie · on Feb 17, 2015

I'm surprised a facade hasn't emerged yet. Personally I think it's nigh impossible to sanely do merges with a GUI. I know people do, but I think they're crazy. Using git via any JetBrains IDE is the best I've seen.

recursive · on Feb 17, 2015

I'm confused. Isn't the JetBrains IDE a gui? How can it be the best you've ever seen and impossible to sanely do merges?

FWIW, I've only ever done git merges in a gui, and never had any difficulty with it.

gknoy · on Feb 17, 2015

He might mean that he uses Jetbrains' IDEs with git for nearly everything, but feels that using it to merge things is madness.

It seems strange, but I really like having a console-based workflow with my VCS. I can leverage grep, shell aliases, and so on to customize my interface a fair deal, and the forced interaction helps keep me mindful of what I am doing.

tootie · on Feb 18, 2015

Typo. I meant it's impossible to do without a GUI

melloclello · on Feb 17, 2015

I agree - I don't think it's possible to put a sound visual representation over the top of git, it's too abstract. A coherent command line interface should be possible, though.

mikez302 · on Feb 18, 2015

Something like this, maybe? http://tonsky.me/blog/reinventing-git-interface/

zallarak · on Feb 17, 2015

The conceptual model matches the implementation - this is what Don Norman's "Design of Everyday Things" says is good design. When this is the case, a user can interact with the object's interface using logic and intuition and get predictable results.

I agree, git really embodies this.

bch · on Feb 17, 2015

> The conceptual model matches the implementation

I depends where on the conceptual stack you think you lie. I use fossil[1] because I don't have detached heads, need to understand the backing-store to reason about "where am I?" and have "porcelain" that seems to leave every single operational aspect laid out for me, instead of abstracting it away. When I work w/ version control, I work with files, and putting them away. Detached heads shouldn't be my concern, nor the conflicting ways describing how to sync[2], etc., etc., etc.

[1] http://fossil-scm.org/index.html/doc/trunk/www/index.wiki

[2] http://stackoverflow.com/questions/15316601/in-what-cases-co...

EDIT: spell 'porcelain' correctly.

wcummings · on Feb 17, 2015

Having grown up w/ git (svn was on the way out when I started programming) and then having had to use svn at a previous employer (I migrated it to git, eventually), I can't fathom a reason to prefer svn or to "hate" git other than ignorance.

You can spot an ex-svn user from a mile away by their commit history. We need rehabilitation clinics.

marssaxman · on Feb 17, 2015

There are more VCSes in the world than just those two. Many people seem to credit git with all the virtues of distributed version control in general, without noticing that its interface is terrible and that the same ideas have been implemented in a much more friendly fashion by other tools. Git is popular because it is the official vcs of the Linux kernel, and the network effect did the rest.

therealdrag0 · on Feb 17, 2015

> in a much more friendly fashion by other tools.

Examples?

ufo · on Feb 18, 2015

Mercurial, Darcs, and so on...

acjohnson55 · on Feb 18, 2015

As the article puts it, the internals of Git are brilliant. The UI (the stnadard CLI) is kind of abysmal. It's primary saving grace in my opinion is that for daily work, the number of command variations you need to know is small enough to simply memorize them. 95% of the rest of the time, the StackOverflow answer you're looking for is just a Google search away.

to3m · on Feb 18, 2015

Large hand-edited binary files such as PSDs benefit from svn's lock functionality.

serve_yay · on Feb 18, 2015

I am getting a message when I push stuff to Github (or any remote) because there is a git config value that I need to set. I just did a search to try to understand the effect of each option for this value, so I could pick the one that best fits how I want git to act. I am approximately 90% sure I understand what each one does, however all of the explanations of them have some element of confusion or lack of clarity. This is the problem of using git.

lilyball · on Feb 18, 2015

Could your "problem of using git" be a result of "doing a search" instead of reading the documentation that Git provides? `git help config` gives documentation on all of its config values.

zem · on Feb 18, 2015

tangentially, that's a really lovely header font ("jim nightshade"). one of the very few times i've seen a "fancy" font work well in a post like this.