On Git's lack of respect for immutability and the Best Practices for a DVCS

Locke · on Feb 23, 2009

Here's, I think, the big philosphic difference: Some people only want good code in the repo. Some people want every change in the repo.

I'm in the second camp. I'll gladly commit broken code and then fix it later. I don't push broken code, of course, but sometimes it's useful to have a snapshot of the broken code before you start fixing it.

For example, let's say I'm accepting a patch from someone or I found a useful snipit of code on the internet. I integrate the code and it breaks a unit test in some non-obvious way. I like to commit the code as it was from the original source before I start trying to fix it. That way I can quickly revert if my fix attempt is way off base.

Further, I have in my history who / where the code came from and how it was fixed (in a later commit, complete with an easy to get diff).

I'm not saying Eric's best practice is wrong. Some people prefer to only commit good code, some prefer to commit early and often. I don't think either practice is inherently better than the other.

Edit: I realize I didn't really respond to the immutability issue... I don't really have an opinion on that.

gecko · on Feb 24, 2009

I kind of think you did respond, actually. Once you're committed to allowing broken code into the repository, as long as you don't push it, then you're de facto in the immutability camp, since rebasing not only doesn't offer you anything, but destroys what you're trying to deliver.

jerf · on Feb 23, 2009

Git is actually as immutable as the next DVCS. Check out "git reflog"; it records the state of the repository for most, if not all, cases where it changes something around. Pragmatically, this allows you to recover from a botched rebase -i or something.

Where git differs is not in its mutablity, but in how it allows you to move around the head of a branch in a more free fashion than most other DVCSs. (I don't mean "more free" as a value judgment here, just an observation that it is a superset of typical capabilities.)

It should be pointed out this only address about a half of the objections you can raise for this re-writing; it is not true that you lose things because of the re-writing, because you can get them back from the reflog. But it is of course still true that history rewriting in the sense that most people mean it is true.

Basically, from a technical point of view this is a non-issue, but from a human point of view it still can be.

Personally, I'm willing to concede this is a powertool and perhaps isn't for everyone, and by the definition given it's not a "best practice", but I don't think I'm willing to accept that definition in general. I'd prefer something more relative to the local user set, not go for a global definition. Perhaps some teams would do nothing but shoot themselves, but in my experience with my team and the local related teams, history re-writing is a net gain in terms of how it impacts how we work. And it's hard to convince me of a point my personal experience doesn't correspond to.

gecko · on Feb 23, 2009

I don't mean this personally, but I'm getting a bit tired of git users pointing to the reflog as protection against data loss. The reflog does serve as a way to access no-longer-extent commits--until those reflog entries expire, at which point they are garbage collected and no longer exist anywhere, period. By default, this expiration time is two weeks.

Yes, the reflog can help recover from certain classes of errors, but its band-aid over git's emphasis on destructive history rewriting is incomplete, and can, and does, crack in real-world cases.

The correct solution, I think--which none of the distributed version systems implement--is to provide a way to group related changesets into an übercommit. This übercommit may still be sliced and diced back down to the buggy micro-commits that detail its dirty development, but, by default, will be presented as a single monolithic changeset to the user. No history is lost, and none is rewritten; you're simply altering its presentation.

jerf · on Feb 24, 2009

"I don't mean this personally, but I'm getting a bit tired of git users pointing to the reflog as protection against data loss.... By default, this expiration time is two weeks."

So, set it to ten years, and lose nothing of interest. (It's not a Y2K-like problem because the value of a commit decays over time. Nobody will need to go back in time ten years to recover a commit that wasn't in the main line, because nobody will even know enough to ask a question that could be an answer to.) Does that answer your objection?

"This übercommit may still be sliced and diced back down to the buggy micro-commits that detail its dirty development, but, by default, will be presented as a single monolithic changeset to the user. No history is lost, and none is rewritten; you're simply altering its presentation."

You could do that with git now. Branch for every commit, and squash it back down onto the main branch. Tag the final HEAD of the branch and include the tag in your squash commit record, and you can use it to recover all the component commits. If it bothers you that git still won't understand what that means, you're just a couple of shell scripts away from having the functionality fairly well supported.

(git, thanks to its heritage, shell scripts fairly well, and of course anything else that can do "shell-script-like" things (Python, perl, etc.) can handle it too. We have several simple scripts that sit on top of git and help us impose policy on our branch management. So, I'm not terribly sympathetic to criticisms of git that are one config change or a quick shell script away from being fixed. Although, probably not for the reason you think; it's not because I think those who don't customize their VCS is automatically a lazy developer, it's because everybody has their own unique needs.)

gecko · on Feb 24, 2009

> So, set it to ten years, and lose nothing of interest... Does that answer your objection?

No. It's still lossy. To turn the question around: if Subversion expired changesets after a given length of time, I think you would complain. Likewise, if git expired mainline commits after a certain length of time, you would also complain (I sure hope). git is lossy, and I happen to think that's entirely the wrong thing for a VCS to be. Making it "less" lossy is kind of like trying to keep your teenage daughter "less" pregnant.

> You could do that with git now. [Lengthy explanation follows]

I can also do that right now in Mercurial using the group extension (http://www.selenic.com/mercurial/wiki/index.cgi/GroupExtensi...), which is possible because Mercurial, being written in Python, is trivial to extend. But that's not the same as being part of the Mercurial workflow, any more than the bookmark extension in Mercurial prior to hg 1.1 counted as a real answer to git topic branches, or Loom counts as bzr's answer to Mercurial's mq now: it's not part of the VCS. Yes, I can make these all work however I wish--but as much as people have now forgotten, it's also relatively easy to make CVS and Subversion work in similar ways through tools such as Quilt (http://savannah.nongnu.org/projects/quilt), which allow for rebasing, offline commits, and many other features that you think of as git/hg/bzr-specific. We've abandoned those tools for good reasons: they required extensions, shell scripts, and odd workflows to work in distributed environments. We will do the same to our existing VCSes unless they can approach a more ideal workflow.

arockwell · on Feb 23, 2009

I agree with you. Whenever I hear someone say the words "best practice" I rewrite that in my head as "average practice". If you have a team full of idiots you probably don't want to use git anyways.

jrockway · on Feb 24, 2009

If you have a team full of idiots, revision control is not your highest priority.

durin42 · on Feb 24, 2009

Actually, the less talented a team is, the more important I'd say VCS would be. Idiots is one thing - but mediocre programmers I'd love to have on VCS of any kind (VCS in my mind excludes things like VSS). It means that when someone blows something up, you can (at least in theory) undig the hole.

jrockway · on Feb 23, 2009

What VCS doesn't support mutability? With svn, you can just "svnadmin dump", edit the file in your text editor, and "svnadmin load". Same idea as "git reset", just much more difficult to do. People do it every day, though.

Anyway, the great news is that if you have a central repository, you can deny non-fast-forward commits. That then gives you the same "immutability" or "security" as Subversion or anything else.

The moral of the story is, if you don't want to do something, don't do it.

(As an aside, I think this article is a classic case of, "SEE MY PRODUCT IS STILL RELEVENT!!!11". Would he be saying this if he didn't have an inferior product to sell?)

gecko · on Feb 24, 2009

Ease-of-use does count for something. You're of course completely correct; I can remove history by using "svnadmin dump", or editing the RCS files backing CVS, or probably hacking Perforce's BDB file manually. But that's not part of the default workflow, whereas "git rebase -i" seems to be an important part of the daily git workflow. It's not a throwaway distinction, and has real implications.

lnguyen · on Feb 23, 2009

As has been mentioned in other comments, almost all VCS allow you to change the history in some way, shape or form that probably wouldn't satisfy strict "audit" requirements. There's usually a good reason for it, including one in post's author's own product. [The usual reason is that it allows you to correct something obscenely stupid.]

And usually it's only available as an "admin" feature (at least in the commercial tools I've used). So it's not as if it's an everyday thing developers would use or have access to.

The difference with Git is that you're the admin of your local repository and can pretty much do what you will with it.

But there's a difference between your work repository located on your laptop, workstation, etc. and the "reference" shared repository that's used by your team+. So go and set all kind of hooks to prevent history rewrites (in the form on non-fast-forward pushes and the like) on it. What happens in that case is you can mess around all you like and clean up locally but once you've decided to push up and share, your commit history isn't going to be changing.

+If there isn't a difference between the two repositories then either you're prematurely optimizing your development and shouldn't worry about this issue OR you've got larger separation of role issues to deal with beyond being able to change your repository's history.

tptacek · on Feb 23, 2009

Totally off-topic gripe: "And then I encrypted it with Schneier's latest cipher" --- kill this meme. Schneier is many things, but the top cryptanalyst/cryptographer in the world he is not.

yan · on Feb 23, 2009

Schneier is already known to be a cryptographer and many people know this. If you want to start a new meme with Wang Xiaoyun or someone else, go for it, but I really doubt it'll catch on.

Plus, the reason people know him is because he's a pop security author first and consciously strives to make a name for himself. Actual cryptographers are too busy working with ciphers to build a reputation.

tptacek · on Feb 23, 2009

That you are right about both of these points is not a refutation of my point. If we can't eradicate the meme, let's go for ring vaccination.

gojomo · on Feb 23, 2009

Because in the context of the article, git fans can read the criticism despite "Schneier's latest cipher", it could be read as a wink-wink dig against Schneier.

biohacker42 · on Feb 23, 2009

You can't be everything to all people.

Git is aimed at people who know what they are dong.

Others will benefit from a more restricted feature set.

The bottom line for me is, that criticizing Git for not being immutable is like criticizing Ferrari for being too damn fast.

gecko · on Feb 23, 2009

It's more like criticizing a file system for lacking journaling, and then having its users respond that it does so because it gives you extra power to do things that a journaling file system would stop you from doing--or to implement your own if you see fit.

jrockway · on Feb 23, 2009

It's not really like that.

It's like the guy sells a commercial non-journaling filesystem, and he is trying to argue that journaling is a horrible idea... while everyone switches to his competitor's journaling filesystems.

It's sad when your product is no longer relevant, but that's no reason to make up reasons to attack its competitors. That just makes you (and your product) look stupid.

biohacker42 · on Feb 23, 2009

I wouldn't be so harsh as to say his product is not relevant, it's just relevant to a different market.

What I've seen in corporate America, makes me think his product is far too permissive for a lot of places, places which won't dare touch Git.

jrockway · on Feb 23, 2009

What I've seen in corporate America, makes me think his product is far too permissive for a lot of places, places which won't dare touch Git.

These people are silly, though. I don't know of a single version control system that doesn't let you change history in a straightforward manner.

At least in Git, the cryptographic commit IDs will change or be invalid when a commit is changed out from under you. Most other VCSes are blissfully unaware of history rewriting, and thus someone nefarious with access to the central repository can easily forge history.

I bet that never makes it into the marketing literature, though.

biohacker42 · on Feb 23, 2009

I don't know of a single version control system that doesn't let you change history in a straightforward manner.

http://www.accurev.com/

It has an append only db.

jrockway · on Feb 23, 2009

Are the revisions cryptographically validated, and are there other copies of the repository to compare against? If not, it doesn't matter, you can just edit the disk blocks.

(Yes, maybe you don't care about history that much. But if history is a tool for developers rather than the legal team, you shouldn't care if the developers mutate it.)

moe · on Feb 23, 2009

Sounds like a case of "if you don't like it, don't use it" to me. It's not like "--patch" was a mandatory flag.

LogicHoleFlaw · on Feb 23, 2009

I think a large part of git's history rewriting stems from the desire to be able to mail features around as atomic commits. You don't want to be cluttering up your patch set with spurious interim changes.

lanaer · on Feb 23, 2009

As a git user, I appreciate being able to squash commits together to keep things tidy for those who pull down my changes later — rather than seeing scattered changes that is merged with their own work, possibly interleaved with their own local un-pushed commits, they get one atomic commit for a feature.

That being said, I’m pretty sure Hg gives the same ability through the use of patch queues, which probably makes more sense.

gecko · on Feb 23, 2009

Mercurial's patch queues ultimately provide very similar benefits and drawbacks to git rebasing, except that:

1. Patch queues may be version indefinitely, since they are their own Mercurial repository, rather than eventually expiring, like rebased commits in the git reflog; but

2. The above requires you maintain discipline in committing the state of your patch repository

Although the second can be trivially automated, the default behavior--rewriting your patch whenever you issue `hg qrefresh`--is far from optimal.

jrockway · on Feb 23, 2009

Your post is making the site difficult to read (horizontal scrolling). Can you reformat the bullet points to not be in a code block?

Edit: thank you :)

gecko · on Feb 23, 2009

Done.