Hacker News new | past | comments | ask | show | jobs | submit login
Fix conflicts only once with Git rerere (medium.com/porteneuve)
95 points by heitortsergent on Feb 18, 2015 | hide | past | favorite | 53 comments



I think this is a great tool.

I don't understand what other commenters are complaining about rebase and rewriting history, because this article is not about that, it's about not having to fix conflicts multiple times. This can be handy in a variety of situations. You might not need it, but every person/team has it's own way of working with Git. Rerere is just another tool that might help you in your workflow.

And to the people who are claiming that rebasing is hard, dangerous or a clean history is not necessary: Once you get it's not hard or dangerous, and a clean history really IS useful. Why are we using Git again?

- Being able to work together on a codebase (the history shows the changes)

- Code reviews for new features (the history of the pull request shows the changes the feature introduces)

- Being able to roll back to a previous state in case there is something wrong with the latest version (the history can help you decide where to roll back to)

And a lot of more reasons we use Git that have to do with the history. It is true it takes a little bit more effort to have a clean history, but in big teams/projects it's really worth it.


I agree with you for the most part but most of you would be better off if you keep this in mind :

Don't rebase commits you have already pushed.


A good usecase for pushing rebased work is at the end of a feature branch. For example, you've been working on a feature branch with a team for a time and now the feature is done and the branch can be merged back into master. However, the history of the feature branch might have been polluted with merge commits. To prevent merging these merge commits back into master, you can rebase the total feature branch on to master. It should be clear to the team that at this point, nobody is to work on that branch anymore and go back to master, or create a new branch (from master).


GitLab CEO here. I personally don't like this workflow since you lose the history of what you did tested and it someone cherry-pick/referenced or merged code it will give problems. But some organizations prefer this workflow and if everyone is aware that they should not reference code it feature branches it can work. The linear history is certainly nice. The rebase just before merge is a bit of work. In GitLab Enterprise Edition there is an option to do this automatically when pressing the merge button.


Exactly, if you have pushed it other developers might have referenced, merged or cherry-picked your code. Even if you marked it as a work in progress. And to roll back to a previous state it is not needed to rebase and/or have a linear history. Also see my article about this https://about.gitlab.com/2014/09/29/gitlab-flow/


Git won't let you push a modified history without a force anyway, and you can set up the remote to refuse these pushes if you want.


This is ugly and pollutes your history graph across branches. After all, a merge should only occur to merge a finalized branch in.

This sounds like a completely arbitrary rule to me. I just don't get why people try so hard to avoid merges, and instead go down the rebase route.

Rebasing isn't just a pain, it's rewriting history. It's potentially dangerous in a team environment. You can't distribute your rebased branch without a force push, and that risks two versions of the branch's commits getting into the repository, or losing someone's commits altogether. It's usually not worth the effort.

It also presumes a simplistic development environment. Typically, I see branches that fit into categories; let's name them master, develop, small-feature, big-feature, next-gen-architecture.

Small-feature branches can be rebased fairly easily. There's not many commits, they're most likely worked on by a single person so the history isn't distributed, and replaying the commits isn't too onerous.

Big-feature branches are a little more tedious. This is where rebasing starts to become a pain; the big feature may have touched code that has since changed from commits of a few small-feature branches.

Next-gen-architecture is where it all falls apart. Trying to do a major refactor while simultaneously keeping customers happy with smaller features is a fact of startup business, if not most businesses. Rebasing becomes a multi-hour exercise because there are hundreds, if not thousands of commits to replay. And everyone on the team has this branch, and there may be sub-branches of it that are destined to be merged into it. All the sub-branches need to be rebased too.


Rebasing isn't just a pain, it's rewriting history.

I'm so tired of this!! Of course you never rewrite history on a public branch!! Of course rebase is fantastically helpful on a private/feature branch where you are doing development, before you do your final push to the public branch!!

Can we stop with these tired, old tropes please??


What do you mean by "public branch"? Public to the team, or public to the whole world?

I do commercial closed-source development; there is no public branch to the whole world. OTOH, all feature branches are public to the team; only trivial features involve a single developer. So your assertion doesn't seem well-formed to me.

Further: The distribution of code and feature responsibilities in a team of full-stack engineers is fluid. Depending on the day, any person might be working on a feature, and the person previously working on it might not even be in the office. It's a general policy where I work to push your branches before close of business.


    I do commercial closed-source development; there is no public branch to the
    whole world. OTOH, all feature branches are public to the team; only trivial
    features involve a single developer. So your assertion doesn't seem well-formed
    to me.
Actually you can still rebase, but you would benefit from having a separate shared repository for developing that feature, so that you can share rebases without rewriting history on the release remote.

This makes it easier to manage public rebases, because you only affect those devs working on the feature, but more importantly it means that you still get the security because your release repository always maintains the integrity of its history.


By public branch he means something like "master", a branch where we (developers) are using, public to the team. this is not about open-sourcing your code.

Feature branches are throw-away, this is a big part of the zen of Git.


> I do commercial closed-source development; there is no public branch to the whole world. OTOH, all feature branches are public to the team

If you don't work with private branches, it's not surprising that you don't see any value in rebasing which is mostly useful for private branches.

In fact, if you don't have private vs. public scenarios, it's not really clear whether you need a distributed version control system. But that makes sense, DCVS were developed for OSS after all.


> it's not really clear whether you need a distributed version control system

The benefits of a DVCS are far broader than that. Development on code that is stored on individual developer's machines is fundamentally distributed. Removing the requirement to be connected for any interaction with the system is major. Deferring conflicts until merge time rather than using file locking is major. There are loads of benefits from DVCS that have no requirement for an important distinction between public and private.


What does "private branch" mean? All branches are "private" until I publish them.

If I create a branch which tracks a remote branch and make a commit to it, my local branch is "private" and the commit is "private. However, I can push that "private branch" to other developers, and then it becomes public.


>What does "private branch" mean? All branches are "private" until I publish them.

Exactly that. An unpublished branch.


What do you mean by "public branch"? Public to the team, or public to the whole world?

To anyone other than you. If other people are involved, then think very seriously whether or not rebase should be used. Almost in every case it should not be.


Exactly my point. How do you reconcile the idea of rebase "should not be" used vs "a merge should only occur to merge a finalized branch in".

I'm on your side. My objection was to the latter statement, not the former. The basis of my objection to the former was (in part) because of the problems with rebasing when a branch is distributed. The idea that the only merges that occur are merges of finalized branches seemed very naive to me.


> Of course you never rewrite history on a public branch!!

The problem is that git doesn't track what "public" means. There was a proposal to work on this for GSoC 2012[1], but nobody picked it up. This concept is called "phases"[2] in Mercurial, which are a building block for Evolve[3].

In Mercurial, it'ss possible to safely rewrite commits, and it's also possible to safely propagate your edits to other repos. Git would have to grow something like changeset evolution to make this completely safe and to be able to deprecate `git push --force`. There could be a safer alternative that does not require any forcing.

[1] https://github.com/peff/git/wiki/SoC-2012-Ideas#published-an...

[2] http://mercurial.selenic.com/wiki/Phases

[3] https://www.youtube.com/watch?v=4OlDm3akbqg


Sorry for OT but even if you have a point it's hard to take your comment seriously when you end every sentence with double exclamation or question marks.


Well, it was for emphasis. I can't count how many times I've seen the "rebase is evil" trope. It's so tired and abused and annoying and misguided. So, yeah, the sentence terminators were to convey the very real emotion on this subject.


The point of the previous commenter is that even if it's to convey very real emotion, it makes your comment harder to take seriously and actually lessens the impact of it.


I always do my work on a branch, switch to master when I want to pull, and rebase my branch on the new master. If I'm working on a feature branch that other developers are also working on I do the same thing with the feature branch; my local work is on a local branch that I rebase onto the tip of the shared feature branch.

When I want to share my commits, I do my rebase first, do a fast-forward merge in the shared branch (with or without a merge commit, depending on whether or not I need to group my commits), and then I push the shared branch. After that my local branch is rooted at the tip of the shared branch, and prior history never gets rebased again.

Merging master into a shared feature branch requires different techniques; we don't use rebase for that. Instead we coordinate: everyone gets their work into the feature branch (or takes responsibility for anything they're not ready to commit yet), and one person takes on merging master into the feature branch, fixing conflicts, and pushing the result back to the shared branch. Everyone else pulls, rebases any local work they still have, and we move on. When it's time to merge the feature branch into master we do the same thing, but no local work is allowed to be held back, and the person doing the merging will also merge the final feature branch back into master.

Luckily I'm in a small company with not too many developers. I can see the coordination required for what I just described getting more and more difficult as the branches get bigger, last longer, and have more people working on them. When you get up to that scale the Linux Kernel approach of breaking the project into largely independent sub-projects with different owners responsible for merging their own parts is probably the best approach. That makes the coordination manageable.


We've had good success encouraging daily merges of master into feature branches, and then using `git merge --squash` to avoid hopeless polluting of your history graph.

Git merge squash essentially takes a diff of your feature branch and master and applies the diff as a single commit. See http://stackoverflow.com/a/25387972/1935918

You lose the history of the feature branch, but we don't want that history anyway, since once tests are passing and the branch has been reviewed, we treat it as an atomic unit. That is, we might revert the whole commit, but we're not interested in reverting part of the commit.


Doesn't this make commits too big? On the team I'm working with right now, most people are making really small commits, but it makes git's history mechanism really useful.

A single line change can be linked to a relatively small commit, it's extremely useful to figuring out the "why" of something. Having one commit for an entire feature would make that utility completely disappear


Yes, the downside of aggregating history is that you've aggregated your history. So far, it has worked well for us, partly because we try to keep feature branches small.


> This is ugly and pollutes your history graph across branches. After all, a merge should only occur to merge a finalized branch in.

>> This sounds like a completely arbitrary rule to me. I just don't get why people try so hard to avoid merges, and instead go down the rebase route.

I agree, I think that graph looks fine. I often end up with this, not just to avoid a big merge later, but because functionality I'm working on can benefit or depend on other work done on the main branch.


Totally agree. It almost looks like you are my colleague ;-)

At any point, we have at least 3 different feature branches, which can be relatively long running (minimum 1 week), this is because we have a range of products sharing the same base platform.

So when come merge time, well ... we merge and do not rebase master before doing that (well, technically we merge on an RC branch but that still applies). If we would rebase, we would have 2 teams that need to go through a potentially painful rebase process.

Besides, on each branch, there are usually at least 2 developers, so it would be even more chance to have a problem for the developer that is not doing the rebase.

Well, all of that to say that we just do a merge (with no squashing ...I really dislike a big commit, even more if it shares some frontend and some backend stuff in the same commit).


Rebased fixes are easier to audit, so I can see an open source project preferring them. Otherwise, just freely merge to/from trunk. Prevent criss-cross merges though, those are a pain.


First off, thanks for building and releasing something.

I think I might be missing something though, I have never understood why having a clean history (when the real history isn't clean) is important.

If you've got huge branches with scary merges, isn't the problem that you have huge long lived branches? Why does it matter that my graph shows when I actually did merge things?

I usually hear a git bisect argument around this, but are there good examples where it becomes a problem? Bisect lets you specify when to skip a commit, so you can be as selective as you want with it.

Lots of people complain about problems rebasing, and I've repeatedly seen issues where people mess up and that causes more pain when just merging seems to work perfectly fine. It seems like a lot of work, effort and time to avoid potential issues later (and even then I'm not sure what those issues are).

This isn't rhetorical, given so many people have strong opinions on this I assume I'm missing something important.


It's much easier to review a feature branch that has a clean history. Especially if it's remotely complex. If every commit is self-contained and each adds only one logical feature dependency, then every commit can be reviewed individually for correctness and unintended side-effects, rather than the entire complex diff as a whole.


But how often that ideal is true? It assumes that developers don't make mistakes in any of the commits, or never add "temporary" code which is removed in later commits of the feature branch. This is the reason that I tend to review a single diff, so that I don't have to browse through numerous commits and check if some line of code in a commit ends up being in the final merge diff.


No, it assumes that developers clean up their branch history as necessary before submitting it for code review.


That seems like a lot of work just to make code review easier. Of course, you can clean up the branch history by squashing commits, but that's the same as reviewing the complete merge diff..


Not just code review. It also significantly helps reviewing history later, which is done for any number of reasons, including tracking down where bugs were introduced, understanding the reason for a given change, etc.

I've worked in projects that always keep a clean history. And I've worked in projects where developers don't bother cleaning up history before pushing. And in every non-trivial project that follows the latter I've always hit cases where I try to track down something through the history only to find that the history doesn't capture the meaning of the code. This is not just bad merge behavior, but also things like lumping unrelated changes together into a single commit.

Even something just as simple as reading recent history to keep up with the changes being made to the codebase is significantly simpler if developers are diligent about keeping a clean history. For example, the Rust project has a lot of activity, but also has a strict code review policy that means that nearly all commits are done with a clean history (very occasionally, long-lived branches are allowed through that have control merges, but those are rare). And as a result, even though I no longer have the time to actively contribute, I've still managed to keep with every non-trivial change made to Rust merely by periodically reading through the history since my last pull. I can't imagine trying to do that if the history weren't clean.


Arguably the only reason it could be a lot of work is when the submitter doesn't really understand the code that multiple mixed up and muddled commits have resulted in. By giving your reviewer a mess, you're only asking for the reviewer to make sense of it instead of doing it yourself.

Instead, breaking up the feature into logical pieces allows you to documented every step properly in a separate commit message.

If you properly understand what you're submitting for review, then it isn't really much work at all. It's even easier if, during original development, you commit often. But you can always split commits up during rebasing, too.


If you really care about this, rebasing is a shitty solution. Patch queues like mercurial's make it much easier to logically decompose a feature into a series of patches and shuffle code between the commits until they all settle nicely.

I say this having gone from a mercurial user that produced pristine commit histories to a git user that makes dirty, nasty histories.


Why are you making dirty, nasty histories? The git users on kernel.org sure don't, and I bet they have more complex branches in flight than you do...?


Supposed there is an undesirable behavior caused by some subtle interaction of features coming from 2 branches.

With a normal merge, git bisect will likely point to the merge commit as the first commit that introduced the behavior, which was the first time these 2 branches ever interacted. This means resolving inconsistencies after the fact, somewhat similar to "eventual consistency". It is more coarse-grained, can be harder to reason about, but may scale better (no serialization).

With a rebase, git bisect will point to one of the rebased commits, each of which already interacted with the branch coming before. Rebasing is sort of similar to the situation where DB client retries a transaction because the DB doesn't know how to serialize 2 transactions. It is more fine-grained, can be easier to reason about, but may have problems scaling, and may sometimes be tedious.


For me, git bisect really does become much more interesting to use when feature branches are rebased to get a clean history. Just reading the history becomes easier, since each feature branch has a self-contained part of the ordered history.

In addition, it is also useful when merging a feature branch turns out to be a problem later on. Reverting a rebased merged feature branch is very easy, without having to think about if there any issues with mid-branch merges and so on.


Neat, I love how git has these tools that are so clearly targeted at 'I did this before' developer problems. Hey, I keep having to repeat this exact same merge, let's automate that.

A similar problem is 'When did this break?', where you do the whole bit with "well it was working before here, but not now, it must have changed somewhere in the middle" then you're doing the binary search of the changesets between etc.

The solution is called "bisecting a problem" as in cutting in half repeatedly. The git command/tool for this is:

http://git-scm.com/docs/git-bisect

Using these commands, the system checks out versions in a binary search pattern, and asks you if this code is broken or not, then after some iterations tells you which commit broke things. Really handy. I suspect with some scripting if you could create an automated test you could wire it up to be completely automated as well.


`git log --follow $NAME_OF_FILE` will show you each commit that contains a change for the given file. It's very handy when doing the "when did this break?" dance if you've narrowed down the potential files!


> This is ugly and pollutes your history graph across branches. After all, a merge should only occur to merge a finalized branch in.

I thought it was best practice to merge master into your branch now and then to keep it up to date and avoid one big messy merge at the end. What is the disadvantage exactly?


He addresses this in the article: you then have multiple content-free merge commits in your history, making it more difficult to see where the branch actually begins (see "control merges polluting your history graph" in the artcle). Personally I don't think it's a big deal, but I understand if others take it seriously.


Thanks. I read it but didn't see the big deal but putting it like that I can see why it would be. I still prefer doing it this way though.


I'm now suffering a lot of rebase pain while working on a new .Net project. I'll give it a go.

I'd be great if there was a strategy for fixing conflicts with VS files.


I tend to find it best to .gitignore any IDE files. IDE project import can (or should be able to) regenerate them as and when necessary.


The inability of VS to play nicely with merging is a real pain. (DBML files in particular are a huge headache, but .csproj, .sln, .config files also are troublematic).

I'm not sure I follow the author's issue with control merges though. Apart from a dogmatic statement that merges should be final (says who), what problems do control merges create?


I'll second this. Half of my conflicts when working on .NET projects are related to changes in .csproj, DBML, or EDMX files. I understand the function these files perform, but it certainly doesn't make it any easier. Just because I and a colleague happened to both add completely independent files to a project doesn't mean I should have to deal with a conflict.


What problems did you have exactly with sln/project/config files? They are plain text after all so if there are conflicts they should be of the same 'level' as conflicts in code files (i.e. can be a breeze to merge or a f*ing pain depending on the situation).


The problems are twofold. First, these files and some others that VS uses are XML, and XML has this property where semantically-equivalent files can be structurally-different. Empty elements can be rendered with or without end tags, empty attributes can be present or absent, whitespace between tags doesn't matter, etc.

(sln files aren't really XML, but they're XML-like in their structure. I've had fewer problems with them than the other files.)

The second issue is the way Visual Studio modifies these files: it reads and parses them into a DOM tree, makes changes to the DOM, and then serializes the DOM back out to the file. So a small change in one part of the project causes the entire file to be re-written, and it often gets restructured in the process. I've found this to be a bigger problem when different developers are using different versions of Visual Studio; at my company we've had 2010, 2012, and 2013 all in use at the same time on the same projects until everyone upgraded. (And we had good reasons for not upgrading everyone at once.)

I guess there's a third reason: most .NET developers never look at these files directly; for the most part they are only modified through Visual Studio. So, although they're just text files and conflicts are resolved the same way they're resolved in code, that's made more difficult by not knowing what the files are supposed to look like or what the correct resolution should be. The sln files are particularly annoying in this regard because they're about 80% Guid strings and 20% readable text, so it can be really hard to understand the diff.

When I started working with .NET, I had to learn all about these file formats when I started running into conflicts in them, and now I'm comfortable fixing conflicts and often editing them directly. I've helped most of my team gain the same comfort level, so we don't have problems anymore. The conflicts, when they happen, are just annoying now.


Finally, an article that explains it comprehensively. My prior knowledge of it was hazy at best, and it seems like I did not take full advantage of this.


Conflict re-resolution was literally the only thing that hurted me while using git. So now this tool is absolutely perfect for me.


This post clicked for me. I tried and failed to use it once before, read this today and it worked immediately.

Thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: