"In my time with Mercurial, I have seen it grow in fascinating ways. These include the concept of changeset evolution coming to life and the announcement of both Facebook and Google choosing Mercurial over Git. The future of Mercurial is that of scalability and because of that, I believe the best days of Mercurial are ahead."
Is it time to give Mercurial another shot? I first migrated from SVN to Mercurial way back when - but after the massive increase in Git's popularity I bit the bullet and switched.
There exists no easy tutorial, that takes a beginner up to speed with git. You need to read almost a book length of material and familiarize yourself with some of the git's internals, and to some extent build your own mental model of git by experimentation, before you can use git without regular wtf-moments.
Well, there is also no tutorial that would provide the same level of understanding of the internals of Mercurial, but by some magic it seems that beginners do not need to acquire the same depth of understanding of Mercurial that they need to acquire about git, before they can start to work with Mercurial without wtf's and frustration.
Once you have a solid understanding of how DVCS's work, you can use both Mercurial and git without problems. But if you need to bring a bunch of beginners up to speed, somehow it goes easier with Mercurial.
At least, this is my experience. If someone else has experience in training teams to start using a DVCS, and has compared both git and Mercurial, and has opposite experiences, I'd be interested to hear.
I used git for a couple of years and definitely had regular wtf moments. Switched to Mercurial and they just went away.
The complexity and counter-intuitive (to me, at least) nature of git's model was a major barrier to entry.
I still use git now and then because github is so popular, but for a small team on a moderate-sized project I really prefer Mercurial for its ease of use.
I simply don't agree with this. There's nothing wtf about git add, git commit, git push. Those are the commands that normal developers use. Maybe git reflog has issues but 90% of people are never going to come across them.
1) So you can revert the merge very easily.
2) Merging without a commit.
3) A branch on a remote server that you have a copy of locally and you want to keep track of the status of remote against the local copy (who's behind and who is ahead?).
This is my basic understanding in terms of how I use git. I like Mercurial too and don't have a horse in this race.
I use git every day, all day at work. It is absolutely integral to my workflow, and the only thing I know about its internals comes from people posting about them here.
Needing to know anything about git's internals to be able to use git is FUD.
It has greater extensibility (both in protocol and the code itself), which is why Facebook and Google are looking at it in the first place: it's possible to hook into it in such a way that you can make it scale to ridiculous sizes (see e.g. https://bitbucket.org/facebook/remotefilelog) cleanly, in a fully backwards-compatible way, without forking the product. So it's worth looking at for at least that reason alone.
Beyond that, though, the team itself is leveraging that extensibility to continue to evolve Mercurial in new directions. Years ago, I was able to ship a third-party largefiles extension that worked similar to the GitHub external files thing launched a few weeks ago, and was able to get it integrated into Mercurial easily, where it was heavily used for years by people in the games industry. The team took "bookmarks" (Git-style branches) under its wing when they proved valuable, and have now added a ton of behavior to improve bookmark workflows.
They can even add huge new features. the big one that Git lacks is changeset evolution. Evolution allows you to get all of the benefits of rebasing and squashing, and all the benefits of a traditional DVCS merge-based workflow. Specifically, you can see all the previous versions of a given changeset, but those versions don't need to clutter up the history unless you're specifically interested. This means you're dramatically less likely to lose critical forensics information over the course of developing a new feature, and much more likely to be able to sanely see where, over the course of several rebases, you mucked up how a chunk of code actually worked.
> it's possible to hook into it in such a way that you can make it scale to ridiculous sizes (see e.g. https://bitbucket.org/facebook/remotefilelog) cleanly, in a fully backwards-compatible way, without forking the product.
That's not entirely true. For instance, some of the scaling is going to require a new manifest format that is not going to be backwards compatible.
Oh god, revsets. When i was able to use Mercurial (via hgsubversion and hg-git), i used revsets constantly, to ferret out all sorts of useful knowledge from the repository. My Git-using colleagues thought i was some kind of wizard. I switched jobs, and now i just use Git. I miss revsets so much.
In that lightning talk, were you actually typing or were you playing an accelerated recording of you typing previously? Cause it seems like you type at something like 300 wpm or something insane like that!
Performance on huge monolithic repositories like the ones Facebook, Google, and Mozilla work with is actively considered by the mercurial developers when integrating new features.
Aside from what others have mentioned, the biggest thing for me personally is hg share, i.e. the ability to have two or more independent checkouts of the same repository. Git does not really support this. Your options are (1) git-new-workdir, which is not safe and can lose data and (2) doing a complete clone, which is cumbersome, because you can't directly diff/merge against branches in the other repository, but have to push/fetch first.
Disclaimer: I'm one of the strange people who (by preference) still uses Bazaar over Git and Mercurial whenever possible, so that may skew my preferences.
To add to the Google bit, from Facebook's F8 there was this interesting tidbit (source[0]):
> Quote from Google: "We're excited about the work Facebook is doing with Mercurial and glad to be collaborating with Facebook on Mercurial development." (Well, I guess the cat is finally out of the bag: Google is working on Mercurial. This was kind of an open secret for months. But I guess now it is official.)
As well, if you search the Mercurial-devel mailing list[1], you'll see that Google engineers has been committing to Mercurial for a bit now. (Looks like they started in Aug 2014, and really picked up in Nov 2014).
TortoiseHG. SourceTree is getting to be OK, at least on windows but git gui apps suck. And if you are doing anything non-trivial and non-automated then gui SCM clients are just fine kids.
I heard that Facebook picked Mercurial because it was easier for them to make it work at their scale.
On the other hand, the company I currently work for also uses Mercurial, but all the developers besides maybe the founder (I haven't asked) prefer Git. I don't know the specifics about why everyone feels that way but from what I gather it has to do with branching being easier.
I've come to the conclusion that git got branches right. The idea of permanent metadata for branches is appealing, but repository-wide single namespace is bit of a PITA in some circumstance. Mercurial's bookmarks are the same as git's branches, but Bitbucket support is lacking.
The other big thing git has is startup speed. Python is slow. Something like chg helps, but it's a workaround.
I am not really happy with either Git's or Mercurial's solution.
Git's branches introduce too much fragility in the system for end user consumption (because branches are responsible for both naming and keeping commits alive) and do not allow you to describe sets of commits (nor is there an alternative feature for that), Mercurial's branches are too permanent in that they cannot even be renamed (well, there's an extension for that, but that's not without issues). Mercurial bookmarks still do not allow you to label sets of commits, but at least aren't responsible for the liveness of a commit.
In practice, I like both Fossil's and Bazaar's model better.
What do you mean by a "set of commits". Why not just have topic branches?
Branches isn't the only thing that keeps commits alive, you also got the reflog and tags. I have a hard time understanding why you would want to keep something alive that isn't reachable from the history.
Branches as sets of commits are useful for understanding history. I.e., which commits conceptually belong together and with a quick descriptor of what their purpose and/or who their primary author is (or whatever else your chosen policy for naming them is).
That branches are not the only refs that keep commits alive is irrelevant, because the other refs by themselves are not sufficient; not every commit is reachable from a tag and reflog entries expire. Branches remain a necessary ingredient.
Git needs this because it requires a named head for each commit because (1) it has a garbage collector and (2) it doesn't allow for multiple checkouts. In version control systems with multiple checkouts, a new branch will typically end up in a checkout of its own. Because there's no GC, it will not spontaneously disappear, and because each checkout keeps track of its own head, there's no need to name it. Git branches are an artifact of Git's odd architectural choices.
But my bigger point is not about wanting to keep something alive that isn't part of the history, but that branches may not always accurately reflect what's part of the history (because Git allows you to alter them, and that's often even part of the normal workflow, e.g. deleting a temporary branch once you don't need it anymore). This usually does reflect human error, but, well, a major point of using a VCS is to protect against the results of human error. If you messed up your branches, then you can lose data either because (1) it eventually gets garbage-collected [1] or (2) it doesn't get pushed anymore and some day your computer or hard drive breaks. This is what I meant by Git introducing fragility.
[1] Yes, once the grace period expires and it disappears from the reflog, but that only makes it less likely, not impossible, especially for stuff that sees only intermittent work.
Well, you can set the expiration period for the reflog. If I remember correctly you can set it to "never". You can also disable alteration and removal of branches on a remote git repo. It's really all about knowing your tools.
Then again, in my experience, if you manage to remove data or alter data which should not be removed or altered, and don't pick up on that before after the reflog expiration period, then you have bigger problems.
> Well, you can set the expiration period for the reflog
Much easier, you can turn off garbage collection (if you set the expiration time on the reflog, you have to make sure to set gc.reflogexpireunreachable, not just gc.reflogexpire). This is indeed what I do whenever I work with Git as the frontend (though not when I uses Git programmatically as basically a versioned DB backend). But that's not about me. I'm fully conversant in Git, up to and including having hacked Git repositories using dulwich. It's about users in general, all of whom have to deal with this.
Of course, this has its own problems, because lots of Git usage patterns have evolved around creating garbage and then throwing it away rather than having an explicit and safe delete operation that is rarely used (or at least to hide revisions rather than to delete them), and because it may not be possible on hosted repositories, such as on GitHub (Bitbucket allows you to prevent force pushes to master branches, GitHub doesn't [1]).
And even when you've made it safe, you still cannot have unnamed branches or multiple checkouts [2].
The more general problem is that Git's GC is not a user interface design decision. Git having a GC is ultimately a result of it being easier to hack together your own transactions without using an actual DB backend such as SQLite when you use (mostly) functional data structures with a GC. While this makes Git a fairly robust versioned database, these and other implementation decisions spilling over into user space is something that has plagued Git for a long time: for full proficiency with Git you need a fairly deep understanding of implementation details.
> Then again, in my experience, if you manage to remove data or alter data which should not be removed or altered, and don't pick up on that before after the reflog expiration period, then you have bigger problems.
This can easily happen on branches that see only intermittent changes or on personal repositories. Git was originally designed for being heavily distributed work, and it shows; it's relatively safe when commits are regularly mirrored to a network of contributors, not so much when they don't.
And, again, this is a problem that should not even exist. One of the primary tenets of source control management -- a major reason why we even have it -- is to protect developers against their mistakes. In particular, there are many cases where VCSs are being used by non-technical people (such as technical writers working on documentation) and they cannot be expected to have a deep technical understanding of the tools they are using. VCS tools must therefore be error-resistant.
[1] GitHub in many ways is still shockingly primitive, such as how it still doesn't support attaching files to issues. Referencing gists via URL is a sort-of workaround, but even that becomes painful with binary files. But that's another story.
[2] The reason for the latter is subtle, but basically boils down that the GC either needs to have global knowledge of all checkouts in order to trace checkout-local refs such as HEAD properly (which may be lost by a simple mv) or that multiple checkouts that use the same branch can get in a battle over who gets to point it to which commit (e.g., commit to master in two checkouts, and only one commit operation can "win", because master cannot point to two commits at once).
> ... lots of Git usage patterns have evolved around creating garbage and then throwing it away rather than having an explicit and safe delete operation that is rarely used
Any delete operation in Git is safe. At least compared to delete operations in other vcs's (hg strip comes to mind).
> may not be possible on hosted repositories
That isn't Git's fault though, but the hosted repository. You could say the same about Mercurial bookmarks.
> And even when you've made it safe, you still cannot have unnamed branches or multiple checkouts.
I don't understand why this would be nice to have. Mercurial had the former, which to me was completely useless. Should I ever want to go back to a previous unnamed branch, I would have to actively search for it, which on a large project would take forever. I always keep my branches named, after the issue number, for this particular reason.
I don't really understand what multiple checkouts are for (a quick google gave me nothing). I would assume it would be something like having multiple local repositories, but I assume that is not what you're getting at?
> a fairly deep understanding of implementation details
Which takes literally fifteen minutes to teach (at least the way I did it), and after that it becomes a lot easier to explain what the different commands actually do.
> it's relatively safe when commits are regularly mirrored to a network of contributors, not so much when they don't.
A network doesn't provide any more safety (except from a computer crash), as you don't push garbage.
> is to protect developers against their mistakes
To a point. To be completely safe we would have to auto-commit on fixed intervals, like a traditional backup system. Instead, the user is empowered with the decision to decide what gets stored. If the user wants to remove something entirely, or re-write history to make it more readable or whatever, he/she should have that power too. A default 30 days of full history is, IMHO, quite reasonable given what the user can do. If that doesn't meet your requirements, then there are other, better options out there.
> they cannot be expected to have a deep technical understanding of the tools they are using
They don't need one. Have someone teach them git add, commit, push, pull and merge. Then teach them to NEVER rebase. Done.
We could probably go back on forth on this a lot, as we're unlikely to change each other's position, I'll call it quits. Thanks for a fruitful discussion :)
> Any delete operation in Git is safe. At least compared to delete operations in other vcs's (hg strip comes to mind).
While I'm not a big fan of Mercurial's strip implementation (because the management of strip backups is cumbersome), in order to permanently lose data there you'd have to do hg strip and then also expressly delete the backup in a separate step. Git does the deletion on its own.
More importantly, you seem to be mistaking this for a Git vs. Mercurial argument. I am not making a Git vs. Mercurial argument; if anything, my position is that "Mercurial is less bad than Git, except in some areas where Git is less bad." And even that's not what I'm getting at, because I'm not interesting in comparing them against one or the other, but to point out that both are lacking.
In general, SCM development has stalled since the late aughts and content with a relatively subpar state of affairs. That too many people seem to think that Git and Mercurial are the only DVCSs ever invented and that centralized version control means either CVS or SVN doesn't help, either. My interest is in getting less primitive version control than the existing systems, not defending the antiquated status quo (antiquated to the point that there are even older, now forgotten systems that did some things better than the current crop).
> That isn't Git's fault though, but the hosted repository. You could say the same about Mercurial bookmarks.
Mercurial bookmarks, if misplaced, do not result in revisions on a server becoming inaccessible.
About unnamed branches:
> I don't understand why this would be nice to have. Mercurial had the former, which to me was completely useless.
They go hand-in-hand with having multiple checkouts. Create a temporary branch in a second checkout, and you don't need any label for it (because it's the tip of that checkout).
> I don't really understand what multiple checkouts are for (a quick google gave me nothing). I would assume it would be something like having multiple local repositories, but I assume that is not what you're getting at?
For example what (as I noted) git-new-workdir attempts to do (again, that's unsafe, use at your own risk). If you aren't familiar with this concept, you can't know very many VCSs, though. It's supported by all DVCSes other than Git in some form of another (though arguably a bit indirectly in Darcs, and Mercurial's version doesn't have good commandline support). It is also obviously supported by all centralized VCSs.
It simply means that you can have two or more checkouts (working trees, in Git parlance) of the same or different revisions, but only one shared repository. Multiple clones can only crudely approximate the behavior, since you can't directly diff or merge between branches in different clones, say. A simple example of the usefulness is to be able to compare the behavior of different versions side-by-side (particularly when you're dealing with slow builds). Another is to be able to create a separate checkout for an urgent bugfix without interrupting your current work, then easily integrating that bugfix into your work.
> A network doesn't provide any more safety (except from a computer crash), as you don't push garbage.
My point is that the likelihood is higher that accidentally deleted material has already been shared with other developers, so it's more likely to be recoverable if you misplace a branch.
> To a point. To be completely safe we would have to auto-commit on fixed intervals, like a traditional backup system. Instead, the user is empowered with the decision to decide what gets stored.
Agreed. But my point is that Git unnecessarily trades away security, based on an implementation decision (to achieve transactional properties by using functional data structures plus garbage collection). Nothing is ever going to be perfectly safe, but that was entirely avoidable.
> We could probably go back on forth on this a lot, as we're unlikely to change each other's position, I'll call it quits.
I don't ever expect to change anybody's opinion; I am merely explaining.
The reason I use Mercurial as an example is because that's the other DCVS I'm most familiar with.
> A simple example of the usefulness is to be able to compare the behavior of different versions side-by-side (particularly when you're dealing with slow builds).
That usecase I can understand. The other two you mention though could easily be solved by branches.
Not related to the core product but still very important for the experience: A decent Windows gui, tortiose hg. I have tried all the guis for git, both commercial and free, and they all suck rats ass. Either they are slow, they only support github, they look like shit, they only support the most basic commit operations, they can't show the diff between selected revisions, they destroy your merges and simply make everything related to source control a pita. There are many many of them available that can do some of the things I listed above but none that does all of it.
"In my time with Mercurial, I have seen it grow in fascinating ways. These include the concept of changeset evolution coming to life and the announcement of both Facebook and Google choosing Mercurial over Git. The future of Mercurial is that of scalability and because of that, I believe the best days of Mercurial are ahead."
Is it time to give Mercurial another shot? I first migrated from SVN to Mercurial way back when - but after the massive increase in Git's popularity I bit the bullet and switched.
What does Mercurial have over Git today?