Hacker News new | past | comments | ask | show | jobs | submit login
Rebase Is Safe (extracheese.org)
24 points by garybernhardt on Dec 12, 2010 | hide | past | favorite | 27 comments



I simply don't see how this effort to use rebase instead of every merge helps anything. While there are valid uses for rebase I simply don't buy this "cleaner" (and incorrect) history argument.


Git has changed how I work with and even think about a codebase, and rebase is a big part of that. Because of git's speed, the warm fuzzy security provided by its hashing mechanism, and a few other things, a new dimension (time) has opened up for organizing my ideas and experiments. Before git, committing to source control was like laying down layers in concrete. That meant going back and fixing anything was a concrete-smashing construction project, and that led to a mental model in which version history was just a documentary record: if things were originally done in order X,Y,Z, to turn that into X',Y',Z' would be to falsify history. With git I have a different mental model: the temporal dimension is no longer just a factual history (though it still is that at a large scale). It's now a medium for organizing and reorganizing things logically. It's like having a new scratch buffer in your brain.

Such a new medium is a rarity in programming and can take a long time before it finds its proper place. It seems flawed because it deviates from the "normal" way. But this new tool has led to breakthroughs in how my designs evolve, and I wouldn't give it up willingly. That's despite the fact that some of the criticisms of rebase are real: e.g. you can break previous commits without knowing it. The value of the feature far outweighs these costs, at least for the projects I've used it on. You know how Lisp programmers go on about how malleable Lisp programs are, like you're molding in clay rather than pouring concrete? It's analogous to that.

Edit: another analogy is interactivity. Rebase gives you a feedback loop into the evolution of your code the way that REPLs give you a feedback loop into its execution. Qualitatively new feedback loops are extraordinarily valuable.


Interactive rebase [git rebase -i] is useful in open source projects where you want to present your change as a logical sequence of patches. In theory at least, this helps reviewers since you can present it as "first we make this code transformation which doesn't change the semantics, second we add this new function, third ...", and each step can be checked more easily than a single large patch.

When you start out making a change, you are feeling your way through what needs to be done, and it's not until you've done some experiments that you have an idea of the logical sequence you want to present. At that point you can start to use interactive rebase to split and combine individual changes, and change the order to make up the logical story you want to tell. Then you post this as a sequence of patches for reviewers, even though it really has no relationship to how you wrote the change.


> I simply don't buy this "cleaner" (and incorrect) history argument.

Incorrect how? Incorrect in that it doesn't reflect the exact way that history was constructed locally? Spoiler: it rarely does. Every —amend, every usage of a queue (MQ, Quilt, …), every patchbomb sent to a mailing list means the history recorded won't exactly reflect the way it was created. And that's for the better: history should be crafted for sense, not for useless historical correctness.


It all depends on what you use git for. git is a great tool for revision control, but it is also a great tool for managing your own work, where it becomes closer to being an editor function than a revision control tool.

When I wrote an essay at school, I was taught to write a draft first, and sometimes a second draft. I typically wasn't expected to hand the drafts in.

When I write code in an editor, I don't save a history of every keystroke, yet I do commit often. In a way, I formalise the undo function of my editor slightly, but it becomes logical-change centric rather than keystroke-centric.

If you treat git as a tool to develop your own work as well as to manage the project revision history, then it makes perfect sense to draw the line somewhere. For work before that line (eg. experimental commits or reworking a patch for upstream submission), use rebase. For things on the other side of the line, use merge. Ruling out one side of this line entirely is just counter-productive.

Another example: git was written for open source projects. It is typically expected in these environments that patch submissions are coherent and complete. Nobody wants to see the mistakes that you correct and then fixed; they want to see patches that make sense and are easy to review. Rebasing is great for this.


I don't rebase everything. Merges still have their place.

An early draft of this post talked about the relationship between rebase and bisect, but I cut it out to focus on one topic. The tl;dr is that heavy merging makes it harder to reason about bisect (and, of course the history in general).


Safe is not a good word for rebase.

It's a tool, and a fairly sharp double-edged one at that.

Use with care.


tell me again why we NEED rebase?


I commit a lot. A lot a lot. Sometimes those commits don't actually work for some reason. For example, when I leave for the day and I'm in the middle of a task, I like to leave a failing test so I have something to pick up on immediately when I get in for work in the morning. I fairly often commit this state.

When it comes time for me to publish my changes, I very much do not want those unpublishable intermediary states going out into the world. Rebase lets me break apart and combine them into rational, test-passing changes that have cogent, readable commit messages instead of "blah", "blah again", "what the hell i forgot to frob the flubulizer!?", etc.

In sum, yes it's a heavy-handed tool. Yes you're "destroying" history, but often that history is extremely ephemeral and mundane and not germane to the actual meaning or effect of the change.


When you publish something to the world, don't you want to take a minute to make sure it's good? You do this when pushing to production, right?

Well, then why not take the same care and simply create a new branch with only the "clean" commits, and then push that branch?

I don't see a need for rebase. Don't rewrite your existing history, make a new branch to push to others! Like "tags" in svn.


This is the difference between distributed VCSes and centralized ones like svn.

When I create a change/feature it usually ends up being separated into multiple commits--usually some generic feature change somewhere followed by the specific changes for whatever I'm doing. In the centralized world I'd just work in my directory for a week and then go through and parse out the changes and check them in (generic first, then specific).

With git I can check them in early but not push them. They exist as living breathing patches while I work on it. I'll amend and rebase them like crazy. After everything is tested and working I push the set up to the main repo so others can get at them.

The important thing is that you don't amend/rebase the main repo. That is messing with history and is annoying to anyone who already grabbed the original changes. But messing with your local repo isn't rewriting history--it's not history yet. It's a set of changes that are still in progress.


It's not like there are clean independent commits just floating around in there with the junk. It's the amalgamation of the junk that produces the clean commits, thus the need for rebase. As someone else in this thread talked about, being able to tweak my commit history has changed the way I've approached coding. I commit often because I know I'll be able to go back and rewrite history before other people have a chance to look at it.


I'm just asking, why can't you make a new branch and merge-commit every 3rd commit to it, with a better message? It would preserve the philosophical "involatility" of history.


Ok, so the philosophical "involatility" is preserved, sure, but that's rather a large amount of work to do to preserve unworkable deadends and silly commit messages and broken builds and the like.

Also, it introduces a huge non-linear mess into the commit history that can be hard to untangle. Just looking at topic branches in a normal tree with gitk can be hard. Can you imagine looking at topic branches and these junk branches at the same time?


You can just create a new branch before you start the rebase, which achieves exactly the effect you ask for. Rebase is how you "create a new branch with only the clean commits".

But, most of us don't bother copying before rebasing. The reflog makes that pointless from a safety perspective and, without the safety need, there's not much reason to copy before rebasing.


Also, I almost always work on topic branches and do my rebasing there. This may be a crucial point that others have missed. Personally I don't rebase on master.


Why not? The end result is exactly the same. You have a little branch of your work and then when you pull next time it gets merged back in. You can even create a named branch for it after the fact by sticking the hash into .git/refs/heads.


For the same reason you refactor code, it is often useful to refactor your commit history _before_ you publish it.

And just because you use rebase, doesn't mean you don't also merge. I often use git like this:

  git checkout -b feature origin/master
  while not done with feature:
    edit, save, commit -am "added blah"
My commits are so small that I can write the entire commit message on the command-line. When feature is done from a coding perspective, I then:

  git rebase -i origin/master
Now I can squash together commits, possibly dropping some, re-ordering, etc. Generally, presenting my change in such a way that I have small'ish, self-contained changes that are easy to review, and with no obvious mistakes.

Now that feature can be published for review. I happen to have setup gerrit for this purpose, but for git.git, you'd use format-patch and you'd email the patches, and that works well too. Each patch needs to be small enough that your reviewers aren't overwhelmed by it. (Smaller commits also help later on if someone has to git blame your code, or if they have to deal with conflicts when merging with your code.)

Rebase is also how you'd incorporate feedback to your patch series. Typically I'll put the correction on-top of my patch series, then squash it into place with rebase -i.

Finally, once your feature is done/done:

  git fetch
  git checkout master
  git reset --hard origin/master # I use master only for integration
  git merge --no-ff feature
  git push origin master
Rarely, I will rebase topic onto a latter version of master, but only if I need some functionality that was added to master since I began the feature. I could merge master into feature, but that then makes feature harder to review.

$0.02.


IMHO, a version control system shouldn't provide facilities for people to edit the history, but instead, to edit meta-information about the history. What git needs is probably a meta-history that gets aggregated during pushes. But if you wanted to see all of your own warts, you should be able to, for the philosophical reason that a VCS isn't supposed to let you "cheat".

"But," you say, "I want to prepare my patch to look super-nice when I publish it to the world!" Fair enough. Then edit your meta-history since that is what the VCS will send. Right now, you can already do this in GIT by creating a specific branch and pushing only IT. Can't you? So why rebase


I think what people with your opinion are missing is that in a VCS without rebase/amend capabilities all that happens is people like me commit way less often. That is, I get everything working and fully tested and only then start checking in my code. With git I can commit almost immediately, before everything is tested and working. It allows me to start organizing the patches early and keep them up to date as I finish doing the task at hand. In that sense it is not rewriting history. I'm creating history. I don't push my code to the central repo until it's done and after that it doesn't get rebased.

Should I start committing every single version of the code I've ever saved so I can see all my false starts and typos? Maybe but there's not a lot of good information to be had there.


There's More Than One Way To Do It. Some people find rebase more convenient, some people cherry-pick commits onto another branch, or squash merge them into another branch and merge that into master. Or whatever. Git is fast and powerful in part because it's really stupid. Building another layer of "meta-history" into git would complicate things immensely.


Because it's awesome to alter history and push your changes when working with other people! Yay!


This is why people don't like rebase, but if you just tell people "never start a rebase farther back than master" the problem magically goes away. Tada!


It's even simpler and more general than this: don't rewrite public history. There, you're done.


I'd add that it's only a problem as team size grows. If the only other people who might be affected are sitting nearby, I can say "has anyone pushed since <commit>" and happily push -f my rebase in most cases.


To track a remote SVN repository. To edit commits earlier in history when you need to correct minor bugs/CR comments/commit descriptions. Edited to add: squashing lots of tiny safety commits into a commit with an entire feature in it.


The svn issue seems under-appreciated. In my experience, far and away, the best way to merge subversion branches is to pull them in with git-svn, rebase, and push back to subversion. I've done this more than once for projects that I'm only tangentially involved with.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: