The author states three reasons for switching. 1. Problems with storing big file...

vtail · on May 17, 2010

> I don't know how you could expect a delete not to be a delete unless you were familiar with something like git already.

Ability to control and be able to reverse any changes you make to the source code tree is an essential feature of version control system - precisely because you don't know in advance what will work and what won't. The fact that people are not used to that simple idea just shows how broken are most of the other tools out there.

There is no reason not to want an unlimited undo for almost everything, especially today when disk space is so cheap.

DrJokepu · on May 17, 2010

Maybe this is not that much relevant for DVCSs but in 'classic' VCSs such as SVN an 'obliterate' command would be quite useful, think about the case when someone accidentally commits something highly sensitive (such as private keys) to version control. People make mistakes all the time, either way.

charlesmarshall · on May 17, 2010

you can use commands like git filter-branch & git rebase to change your history - http://www-cs-students.stanford.edu/~blynn/gitmagic/ch05.htm...

Xurinos · on May 17, 2010

"git rebase" on its own does not permanently alter history. It creates a new branch that looks like a new history, but you can get back to the original history.

"git gc" will prune the unused old histories for that permanent effect.

Vitaly · on May 17, 2010

even "git gc" will not do it for a while. the old history will still be referenced from the reflog for some time (I think default is 30 days)

_ivvf · on May 17, 2010

Most destructive commands in mercurial do create a backup first. However, some command, for which all we know, could have been written by a third party, didn't. That's why I suggest that you read the docs for a command and test it out on non-important data before applying it to your only copy of a repository. Someone could just as easily create an extension in git that forgets to create a backup as well. What happened to the author is entirely the author's fault.

pilif · on May 17, 2010

I don't know about Mercurial, but in your IDE / Editor, don't you like having multi-step undo no matter what command you execute?

Or do you prefer to manually backup the files of your project before doing any operation in your editor that is potentially destroying your files? I'd much rather hit ctrl-z as many times as needed once I noticed my last refactoring has shredded all the files it touched.

Being able to easily undo whatever I do to in whatever application I can think of is a nice feature - even more so if the application is what I entrust all my work to.

_ivvf · on May 17, 2010

When I run sed on a file, I don't expect it to make a backup copy of my file. Similarly, when I run rm, it doesn't create a backup of what I'm deleting. Destructive operations are, well, destructive! I'm amazed at the response I got to my post. Read the docs on a mercurial command. Chances are if it is destructive it creates a backup for you. On the off chance that someone writes an extension that fails to do so, you can create the backup yourself. There's nothing magic about git's operation here. Someone simply took the time to create a backup in git for a command for which the equivalent mercurial command didn't. Big deal.

joeyh · on May 17, 2010

Git's repository structure, plus the built-in reflog prevents committed data being lost by any command, unless it explicitly erases the repository or reflog.

stonemetal · on May 17, 2010

git stash drop

git gc

oopsie data loss. It is about what he did in hg, except in hg it is one command.

epall · on May 17, 2010

The stash has its own reflog

git gc only prunes nodes unreachable from branches AND the reflog, which I think by default is 30 days long

joeyh · on May 17, 2010

I explicitly said "committed data" would not be lost, because data only staged in the index certianly can be lost, and I think stashed data is also semi-suceptable to loss as well.

git stash drop does print the sha1 of the stash, and as long as you know that sha1, and it's not been gc'd, you can get a dropped stash back. (git fsck will also show the sha1s of dangling commits left from dropped stashes.) But while a separate reflog records existing stashes, that information is not retained when they're dropped. It's perhaps best to think of git stash as a more convenient form of git diff > patch, and you wouldn't expect that to keep a log of the patch file either.

stonemetal · on May 18, 2010

Yes but the original author in this very thread has said that the data loss was through MQ(Mercurial patch Queue) delete. mercurial queues are an optionally controlled patch queue, so he had some code stashed it in an uncontrolled queue, deleted it and was surprised that it was really gone. This is rather analogous to what I posted for git. It was a bit of bad luck on his part but switching to git doesn't close that hole his code is still vulnerable to the same set of actions.

So his whole reason for switching is I shot myself in the foot. And instead of learning more about his weapon of choice to avoid doing that in the future, he traded in his gun for one that is slightly more complicated.

_ivvf · on May 17, 2010

Kind of like git branch -d, which, as http://www.kernel.org/pub/software/scm/git/docs/git-branch.h... states, deletes the reflog as well? If it doesn't delete the reflog, then it git world, at least, it probably isn't considered to be fully destructive!

joeyh · on May 17, 2010

git branch -[dD] does not delete reflog entries for commits made to the branch.

A branch may have its own, separate reflog which would be deleted, but that is only a convenience feature as the man page you linked to documents WRT the -l option; the primary reflog still records all operations made on the branch.

You might find it useful to actually test stuff before posting links to man pages that you don't fully understand. I know I do. :P

randallsquared · on May 17, 2010

Destructive commands actually, well, destroying stuff.

Part of the point of using a versioning system is to avoid ever destroying stuff. Therefore, in a versioning system, you'd expect to have to work pretty hard to make a change you couldn't roll back from, wouldn't you?

sid0 · on May 17, 2010

Yes, and hg and git both make you work hard. I'm not sure what his point is. He used a quite advanced command without understanding its consequences, he paid the price. A non-advanced user would really have no need to use hg strip.

technomancy · on May 17, 2010

> I don't know how you could expect a delete not to be a delete unless you were familiar with something like git already.

Unless they were also familiar with functional programming and immutable data structures; it's the exact same mental model.

steveklabnik · on May 17, 2010

> I don't know how you could expect a delete not to be a delete unless you were familiar with something like git already.

Because it's version control; isn't the whole point that I can change things and return to a previous state?

I don't know anything about hg other than it being distributed, but if it destroys things, well, it's not a VCS as far as I'm concerned. That's crazy!

dschobel · on May 17, 2010

I'm not aware of any VCS which meets your standard of not allowing destructive commands (I'm sure you realize you can destroy history/data in git as well, right? It may take a --force, and it'll yell at you, but it's definitely doable)

Shorel · on May 17, 2010

Subversion, the most used VCS in the world, has no totally destructive commands per se.

Once you commit something, it stays in the repository in the commited version forever.

In SVN if you really want to delete a file, you have to stop the service, dump the repository to a text file, edit the dumped file to delete the data (or alternative, do not export the latest revision), restore the repository from the file, and restart the service. SVN has no native way to do that, you have to use some external utility to totally delete something.

Now, it can be argued that this is good or bad design, or that centralized VCSs are bad/distributed are good. But you are making a broad statement about VCSs and ignoring the existence of SVN, and this is misleading to anyone who reads this thread.

kd5bjo · on May 17, 2010

It's a lot harder than that, actually. You have to do --force, clear the reflog, and then run the garbage collector. Your data isn't actually removed from disk until the last step.

dschobel · on May 17, 2010

Yep, and beauty of the system is that anyone who is a sophisticated enough user to do that, would never do it accidentally.

Although, I imagine quite a few users --force a command and, unaware of the afore mentioned facilities, write up a nasty blog post about switching to hg from git because git lost their data :)

MichaelSalib · on May 17, 2010

Sophisticated users make stupid mistakes all the time. They misread output, get confused about the context in which they're operating, etc. Human beings are not machines. You cannot assume that just because a user is sophisticated, they will not make mistakes that lead to data loss. Even the best drivers still have accidents.

steveklabnik · on May 17, 2010

If I git rm something, and then git reset, my file is back. Are we talking about something different, or am I remembering git wrong? I don't actually git rm very often.

And it makes sense that there's a permanent delete feature, but I'd expect it to be outside of the normal workflow.

sid0 · on May 17, 2010

Yes, hg has a non-destructive rm too. And yes, what he's talking about is outside the normal workflow. Way, way outside.

steveklabnik · on May 17, 2010

Okay. Good. I thought it had to be that way.

So why do you speculate he's using the wrong delete?

dschobel · on May 17, 2010

Yeah, it's not clear what he did, all he references is a twitter posting saying he lost data: http://twitter.com/garybernhardt/status/9368728335 , not how.

I think it's fair to surmise that he must have been messing with commands he didn't fully understand.

The bottom-line is that in either git or hg you have to step past the screaming sirens and big red warning signs to truly lose data.

sid0 · on May 17, 2010

Because he lost data? hg rm wouldn't ever lose data. He probably used an hg strip (which is an advanced command and must specifically be enabled in .hgrc through the mq extension) but didn't realize what he was doing...

garybernhardt · on May 17, 2010

The command that lost data was hg qdel. Strip won't lose data; it dumps bundles (although they're a pain in the ass to restore from). Like I said, I used Mercurial for three years. And not "I play with it sometimes at night" kind of used, but rather "it was my main version controls system all day, every day" kind of used. I was using MQ for most of that time.

The fact that so many people consider "hg strip" an advanced command is part of the problem. Modifying history should not be considered advanced. Being able to recover from ANY command, including destructive ones, should not be considered optional.

sid0 · on May 17, 2010

> The command that lost data was hg qdel

Yes, so that deletes a patch. Anything in patches is basically in flux, and I wouldn't call losing a patch "data loss". If you call hg qdel "data loss", you'd call any sort of modification to a patch "data loss", since patches aren't versioned. If you want versioning with patches, use pbranches.

> Modifying history should not be considered advanced.

Maybe, but the only way I modify history in practice is through rebasing. I've never ever felt the need to modify history any other way. What use case do you have for modifying history in potentially destructive ways?

garybernhardt · on May 17, 2010

Git gives me everything I needed MQ for, but with the complete safety of the reflog. There is no such thing as a change I can't undo. The fact that patch management is "special" in Mercurial is the problem!

With respect to history modification:

First, I rebase a couple dozen times per day. I'm on a team that doesn't use merges unless we have a reason to (this makes it easier to bisect and think about history). I also create, destroy, and rebase many of my own branches every day.

Second, I amend commits a lot. I'll often spike some little piece of code I don't understand, then start amending the commit as I rewrite it with TDD, until the commit no longer contains any traces of the original spiked version. For more complex spikes and TDD rewrites, I'll do it over many commits, rebasing the spike over the rewritten version until the spike commit is empty and gets skipped by the rebase. Doing that in Mercurial would be... arduous. I can easily do multiple history rewrites per minute while doing this.

Third, I amend commit messages a lot, usually with "git rebase -i". Maybe I forgot the ticket number, or maybe the meaning of the commit changed (see the next point).

Fourth, I sometimes do drastic commit rearranging. This is harder to explain, but it usually involves splitting commits (in the simple case) or moving sets of related changes from one commit to another (in the complex case). These are sometimes at the file level, sometimes at the hunk level, and sometimes within a hunk. This is rarer than the others; I probably do it once or twice per week.

Fifth, I "git reset <ref>" a lot. It took me longer to start doing this, but it's useful in a lot of situations. For example, "oops, I accidentally created a merge bubble."

utx00 · on May 18, 2010

the patches directory that mq uses can be put under hg control.

i don't know about git reset, but everything you mentioned before can be done with mercurial (mq, histedit, ...)

garybernhardt · on May 18, 2010

Yep – and I did it all, using those tools, for a couple of years. :) Now, when I go back to Mercurial (which I know better than Git, mind you), I get frustrated. Those tools are much more blunt than their Git equivalents.

utx00 · on May 22, 2010

so i tried it for a couple of days, and the speed was the thing that impressed me the most actually. the rest is not that dissimilar, but i didn't dislike it as much as i thought i would :) - our workflow assumes mq already so a git add is a qnew, or qref, a git diff --cache is a qdiff, a git diff is an hg diff ... so on. most commits i make are usually a qfinish, which is comparable to a git commit (no -a).

how do you manage patch queues in git? we need them because we are constantly backporting to different versions of our app. branches will mean n merges for n versions. stacked git? or is there a git native way to manage the same?

what about something like tortoisehg? gitk is pretty crude in comparison.

i assume one can glue a diff/merge tool like meld. is the experience similar when resolving conflicts?

thanks for any feedback.

garybernhardt · on May 18, 2010

Also, MQ's "qdel" command is the one that led to the data loss mentioned in the original blog post. So it doesn't really answer my concerns. :)