Hacker News new | past | comments | ask | show | jobs | submit login
Why I Don't Hate Git: Hidden Consistency (pocoo.org)
157 points by kozlovsky on Feb 17, 2015 | hide | past | favorite | 48 comments



(copied over from lobsters, with links to more material there: https://lobste.rs/s/odj5y1/why_i_don_t_hate_git_hidden_consi... )

Haha, I spurred this debate with mitsuhiko over IRC yesterday. I was arguing with him over git vs hg yesterday, and this blog post is obviously his retort. Here is my defence of Mercurial:

Mercurial’s design aims to welcome people who come from other VCSes. It started being welcoming to CVS and SVN users, and its CLI mimicked those, as well, as a few ideas of Bitkeeper. Git’s initial design was very bitkeeper-like too, such as branching-by-cloning being the only way to branch. Nowadays, Mercurial also makes some concessions to git users.

Despite its various sources of inspiration, Mercurial works hard to keep all of these ideas consistent. Commands rarely grow new options. Many deviations from core functionality are first tested in optional extensions for a long time before going into the core. Lots of time is spent bikeshedding what a command’s name should be, and what language the documentation should use. Consistency is very important. Backwards compatibility is tantamount. The CLI is the UI and the API.

A thing git is often lauded for is the simplicity of its internals, which are frequently deemed to be as simple as to not be internal at all. Despite being a binary format, Mercurial’s revlogs are also approximately simple, which is why people sometimes write parsers in other languages.

But Mercurial is a lot more than just git-with-a-nicer-UI. There are many exciting features in Mercurial, features that I don’t think will ever make it into git because they are just too different from the way git works. Mercurial Evolve really changes the way we collaboratively edit commits. Templates and revsets can be combined to program interesting extensions. New extensions can scale Mercurial into gigantic repos.

And because I think these ideas are so great and must be explored and improved, I will keep using Mercurial, teaching Mercurial, and improving Mercurial


No, Mecurial revlogs are not approximately simple. They were designed to be an append-only log for individual files, in the same way that CVS and RCS performed file-level change repositories. This choice causes problems when doing file renames and history merging because you need to either duplicate data or provide complex pointers between filenames which may not even exist in the tip.

I wrote a post comparing the file layouts a few years ago:

http://alblue.bandlem.com/2011/03/mercurial-and-git-technica...

The TLDR is that Mecurial revlogs were designed from lofty architectural goals (append-only formats, designed to be parsed forwards) whereas Git is just an object soup with pointers to other objects in the same soup. As a result, different file storage mechanisms have been created (direct file, push to remote HTTP/S3, BigTable etc.) and new features (bitmaps, packed archives, delta compression between the same and different filenames) have been grafted on over time. The format is also versioned, with feature versions being added at a later stage to the transport protocols that ship packs of these deltas between versions.

Frankly the only valid criticism of Git appears to be 'The command line flags are a little funky' and given the extensive Git tooling that has been built (it's provided by default in Visual Studio, Eclipse, IntelliJ, Xcode and others) the fact that command line tooling takes a bit of getting used to is really such a non-issue that I'm not going to waste further time talking about it here.

Mercurial is dead, but its fans just haven't noticed yet.


There are problems with Git's "object soup" approach, too. For example:

(1) The performance issues with "git blame" appear to be essentially unfixable. That tradeoff is by design [1]. Note that by "performance issues" I mean that git blame can take several minutes on some repositories (e.g. src/xdisp.c in Emacs).

(2) Git is pretty much tied to a 1:1 repository:directory model and cannot safely support a 1:1 branch:directory model with multiple branch checkouts sharing a repository (git-new-workdir as the closest approximation is not safe).

In general, a lot of operations on Git have an O(branch history size) or even O(repo size) complexity; that is not a problem if you do not need them, but it puts limits on what you can do efficiently with Git (at least without adequate caching and the necessary porcelain to use it).

That, of course, on top of the other criticism typically targeted at Git (poorly designed command line interface, difficult to understand internal model [2], possibility of data loss [3], etc.).

> Mercurial is dead, but its fans just haven't noticed yet.

Not everybody here is concerned with the silly Git vs. Mercurial war (the modern version of Emacs vs. Vi). My personal concern is that the result of widespread Git adoption is that VCS development has stalled and has settled for a "good enough" that I don't really consider good enough; Mercurial is interesting not because they're doing things better (though they do some things better, and other things worse), but because they appear to be the one team still actually experimenting with new things (e.g., changeset evolution). I really wish there were more going on with Bazaar (which is basically in maintenance mode now, and Canonical doesn't even seem to put a whole lot of effort into that, with lots of bugs still outstanding) or Fossil (which is mostly trying to be conservative rather than innovative [4]).

In short: Competition is good, monoculture is bad. The desire to have one VCS to "rule them all" worries me.

[1] http://marc.info/?l=git&m=116991865311836

[2] http://people.csail.mit.edu/sperezde/onward13.pdf

[3] http://jenkins-ci.org/content/summary-report-git-repository-...

[4] Which is a worthy goal in itself, but it also means that they aren't really moving VCS development forward.


Your data loss example [3] is precisely not data loss, like virtually every other example I've seen regarding git. The commits and history were definitively not lost, and although the article doesn't say, the repositories affected by this tool should have even still contained the history record of the old head(s), making rollback to the latest commits a fairly straightforward affair.

This is honestly no different than someone misconfiguring a tool for any other VCS and overwriting the repo's head with old junk. In the (other VCS) case, the fix operation is to go into VCS history and resurrect the correct head commits (e.g. via a 'revert' style operation). In git's case, the fix operation is ... go into VCS history and resurrect the correct head(s), traditional revert not being relevant to how git functions in such exceptional cases. It's either ignorance or disingenuousness to call git's behavior here, though different than SVN, hg, etc. as "data loss".


First, how do you know that no data was lost? There is no way to even verify that all data was recovered. They are pretty confident, but there's really no guarantee, is there?

More generally, yes, the case you are worried about is typically not when there are lots of repository clones in circulation (though the KDE case [1] shows that data loss is quite possible even then and that replication is no real alternative for proper backups [2]).

And, of course, the reflog will keep commits alive for a while and garbage collection will not occur while the grace period is over.

The situation where this doesn't work so well is personal/small group repositories or branches that only experience intermittent commits and that aren't being mirrored by a large number of users. In that case, user errors can easily translate into data loss when garbage collection finally catches up with you.

[1] http://jefferai.org/2013/03/29/distillation/

[2] A practical example would be where a repository is so large that many contributors prefer to use shallow clones.


Git won, but Mercurial seems to have found a niche in organisations with huge repositories. https://code.facebook.com/posts/218678814984400/scaling-merc...


FWIW, I think that the tendency to push things into extensions is a huge net negative for hg consistency and learnability.

VCSes are complex tools and it takes a little while for people to understand the data model and how it maps to the command line UI. This means that people new to a tool — or just new to a particular project's workflow — will often have "how do I do X?" type questions. With git helping people is pretty easy; you just sit down with them and go through the set of commands they need to achieve what they want, explaining the operations on the tree along the way (if necessary). It's even the sort of thing that you can do over irc without too much difficulty.

When using hg there is a whole extra level of complexity because the default setup isn't actually suitable for use in real projects; to get something useful you first have to enable a bunch of extensions. So given a random user with unknown configuration it's hard to know what commands are even available to them without diving into a configuration file, possibly downloading some random python scripts from the internet, and so on. For example the Mozilla source tree has a whole mercurial setup script that configures about a dozen extensions, a few of which are providing useful Mozilla-specific functionality (e.g. bugzilla integration), but most of which are plugging holes in the hg featureset.

I also find if baffling that, given git exists, when hg developers integrate ideas from git wholesale — something that is on the whole positive — (e.g. bookmarks as an implementation of local branching), they conspire to do so in such a way that the experience is jarring for people moving from git. To me it seems obvious that if you are coming second in a space with strong network effects, you don't go out of your way to make it painful for people to switch.

I get the impression that at least some mercurial fans expect it to win in the enterprise, citing organizations like Facebook optimising it to work with their giant repositories. But it isn't clear to me that there's much trickle-down effect from there; most smaller projects simply don't have that much source to control, or the ability to enforce the kind of homogenous environment that forgives many of the shortcomings of mercurial's UX.


Basically this says, "git's UI was so bad that it forced me to learn the internals, and once I groked the internals git made a whole lot of sense." That summary might sound like I'm trolling but I don't think that's a bad thing (else, why is git so popular?). I've been the mercurial "expert" on my team at work for the past 5 years and I can't count the number of little DAG diagrams I've drawn trying to get people to understand what was really going on with their repository. I'm pretty sure there are a few people who still don't really understand how mercurial works under the hood. Maybe that's a positive aspect of mercurial...but maybe it's not :-)

Now, I will say that the internals of mercurial that you need to understand to gain enlightenment seem simpler to me than what you have to know for git. Mercurial has commits and commits have parent reference(s) that link commits together into a DAG. Commits might have a branch label. You move commits from one repo to another with push and pull. You create a commit with two parents by doing a merge. And that's it!

Git has those same basic concepts, but you also have to know about the index, and branches (which are really pointers to commits that may or may not be a branch in the DAG), and remote branches, and merges that aren't really merges because they just move the branch (which is a pointer, remember, that's why a "branch" can move) to the head, and all kinds of other interesting (and useful, no doubt) concepts.


Good article. That old tutorial is scary :)

> I screwed up really badly before, merging wrong things together, accidentally deleting data and much more, yet I never lost any data or felt left abandoned by my tool.

I can relate for the most part. I can only think of one instance (in over half a decade) where I felt git's shortcomings: there is no way to get a deleted non-gc'd object from a remote to your local repository, even if you try to reference it by its sha1.

This happened to me when some bad changes were force-pushed to a repository on Github and did not have access to a machine which had the latest changes. My repository on Github still knew about the old commits, but they were unreachable by git itself.


> I can relate for the most part. I can only think of one instance (in over half a decade) where I felt git's shortcomings: there is no way to get a deleted non-gc'd object from a remote to your local repository, even if you try to reference it by its sha1.

For better or worse (and I've wanted it to work too) it's an intentional security feature that you can only pull objects from a git remote that are reachable by its refs; that way deleted branches (e.g. containing data that wasn't intended for release) are instantly unavailable rather than needing to wait for GC.


Seems like a shaky justification. I understand not offering things that are up for deletion but there wasn't even a way to do git pull --i-really-want-everything or some such.

If you push passwords and keys to your git server, then force-push those things out, you most definitey want to run a gc. Git is a flimsy security layer around this.


Of course you can't run "git pull --i-really-want-everything", you're the remote attacker this feature is meant to protect against!

The use-case for this is that you're pushing to some shared hosting like GitHub where you can overwrite and delete refs, but you can't force a gc.

You don't want someone to scour your Git commit announcements and see "oops, deleted password!" and go and fetch the deleted SHA1.


Well if you don't have enough access to the remote machine to locally get the objects or run "git branch oops <sha1>", you probably also don't have access to run a gc to prune the objects. In that case the "permissive" alternative would mean that you could not remove access to the objects at all once they'd been pushed. Given that, I can see the justification for the behavior they chose.


It is possible with Github at least - using a combination of the list repo events API & the create commit API.


Couldn't you just have (by hand) created a tree/commit/branch referencing the object, pushed it to the server and then fetched it (possibly to a new clone)?


> That old tutorial is scary :)

And funny!

    Creating a new git archive couldn't be easier
O RLY!


Three words against git: Detached Head State

I view git as a sort of shibboleth.

You can't really understand how git works unless you understand trees as a data structure. That excludes all but the hardcore types.

Some designers and CSS experts need to use source control, but Git is too complicated for them.

Once you get a detached head state or corrupted repo, then you need a git expert to clean things up. I once committed while in a detached head state, and so git ate my changes and I had to reflog to recover them. That is just insulting.

At my job, I work with some designers now, and they always leave the test server in a detached head state.

But when I switched from the GitHub client (yuck) to the SourceTree client, most of my concerns went away.


>You can't really understand how git works unless you understand trees as a data structure. That excludes all but the hardcore types.

DAGs and BSTs are taught in second year undergraduate classes where I'm at. What's hardcore about them? Serious question.


Non-programmers tend not to have taken undergraduate computer science classes!


Yes, that was my point. There are people who don't have a CS degree, that should use source control. For example, designers should use source control for their work. Explaining git to them is not feasible.


My bad, I was thinking from a CS centric POV


> unless you understand trees as a data structure

Well unless you understand DAGs, more like? Which is somewhat more unhelpful. :P

(given that merge commits have multiple parents)


Do others have examples of software "where the way it works is a crucial part of the user experience"? The one that comes to my mind is lisp macros; if you mess with them for a month or two you can't help but have a pretty good understanding of how they work. Clean internals can colonize your brain in a way that merely clean interfaces can never compete with.


I like Redis as an example in this category.

I also think this ties in with the law of leaky abstractions, in that a good understanding of that law will make ui designers choose lesser/thinner abstractions over bigger/leaky ones.


This article begs the question - what might a sane/consistent UI on top of git look like?


Last year, I started work on an attempt to basically create a Bzr-like [1] porcelain. I still think this would be the way to go for a serious attempt to provide a consistent, but reasonably powerful UI for Git (especially given that Bzr and Git actually have a fair amount of commonalities under the hood).

I recently gave up for a number of reasons, though:

(1) It turned out to be hugely more complex than I had initially thought.

(2) There would have been the need to store additional meta-data in the Git repository. Combined with the need to intercept a lot of basic Git commands, that would have largely relegated Git to a transactional key/value store, which would have destroyed much of the purpose (interoperability with the Git eco system).

(3) Mercurial 3.2 introduced to bookmark management that emulated the behavior of Git branches better, which (in conjunction with hg-git) made it a viable frontend for Git, and it was in the end easier to fix my remaining issues with Mercurial via extensions than to fix my issues with Git via porcelain.

That said, I think the approach is still viable and in principle a good one, especially if you give up on some Bzr's features that yield relatively little for the amount of work involved (such as putting empty directories under version control).

[1] Bzr suffers primarily from abandonment issues, its basic design and UI (modulo some needed clean-up) is pretty good, especially the model it exposes to users.




Yep. SourceTree with BeyondCompare hooked up works rather well, been using it for a couple years now.


I was thinking a bit about this the other day. I think that it would probably be geared around the way that Git is actually used.

The thing about DVCS is that it's allowed a whole lot of experimentation around different workflows, and two in particular have come to the forefront in particular: pull requests and git-flow. A porcelain designed around these particular workflows would be quite effective IMO.

Also, I think it would have some better, less jargonistic terminology. For example, something like TFS's "get-latest" as an alias for "git pull origin master" would be a lot clearer.


hgit seems like such a proof of concept: a mercurial UI over a git store:

https://bitbucket.org/durin42/hgit


I'm surprised a facade hasn't emerged yet. Personally I think it's nigh impossible to sanely do merges with a GUI. I know people do, but I think they're crazy. Using git via any JetBrains IDE is the best I've seen.


I'm confused. Isn't the JetBrains IDE a gui? How can it be the best you've ever seen and impossible to sanely do merges?

FWIW, I've only ever done git merges in a gui, and never had any difficulty with it.


He might mean that he uses Jetbrains' IDEs with git for nearly everything, but feels that using it to merge things is madness.

It seems strange, but I really like having a console-based workflow with my VCS. I can leverage grep, shell aliases, and so on to customize my interface a fair deal, and the forced interaction helps keep me mindful of what I am doing.


Typo. I meant it's impossible to do without a GUI


I agree - I don't think it's possible to put a sound visual representation over the top of git, it's too abstract. A coherent command line interface should be possible, though.



The conceptual model matches the implementation - this is what Don Norman's "Design of Everyday Things" says is good design. When this is the case, a user can interact with the object's interface using logic and intuition and get predictable results.

I agree, git really embodies this.


> The conceptual model matches the implementation

I depends where on the conceptual stack you think you lie. I use fossil[1] because I don't have detached heads, need to understand the backing-store to reason about "where am I?" and have "porcelain" that seems to leave every single operational aspect laid out for me, instead of abstracting it away. When I work w/ version control, I work with files, and putting them away. Detached heads shouldn't be my concern, nor the conflicting ways describing how to sync[2], etc., etc., etc.

[1] http://fossil-scm.org/index.html/doc/trunk/www/index.wiki

[2] http://stackoverflow.com/questions/15316601/in-what-cases-co...

EDIT: spell 'porcelain' correctly.


Having grown up w/ git (svn was on the way out when I started programming) and then having had to use svn at a previous employer (I migrated it to git, eventually), I can't fathom a reason to prefer svn or to "hate" git other than ignorance.

You can spot an ex-svn user from a mile away by their commit history. We need rehabilitation clinics.


There are more VCSes in the world than just those two. Many people seem to credit git with all the virtues of distributed version control in general, without noticing that its interface is terrible and that the same ideas have been implemented in a much more friendly fashion by other tools. Git is popular because it is the official vcs of the Linux kernel, and the network effect did the rest.


> in a much more friendly fashion by other tools.

Examples?


Mercurial, Darcs, and so on...


As the article puts it, the internals of Git are brilliant. The UI (the stnadard CLI) is kind of abysmal. It's primary saving grace in my opinion is that for daily work, the number of command variations you need to know is small enough to simply memorize them. 95% of the rest of the time, the StackOverflow answer you're looking for is just a Google search away.


Large hand-edited binary files such as PSDs benefit from svn's lock functionality.


I am getting a message when I push stuff to Github (or any remote) because there is a git config value that I need to set. I just did a search to try to understand the effect of each option for this value, so I could pick the one that best fits how I want git to act. I am approximately 90% sure I understand what each one does, however all of the explanations of them have some element of confusion or lack of clarity. This is the problem of using git.


Could your "problem of using git" be a result of "doing a search" instead of reading the documentation that Git provides? `git help config` gives documentation on all of its config values.


tangentially, that's a really lovely header font ("jim nightshade"). one of the very few times i've seen a "fancy" font work well in a post like this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: