I still don't understand why everyone has this misguided quest for a clean history. An accurate history is much more important.
Rebasing destroys historical information. I can't really see any advantages of rebasing when a merge does the same thing but leaves two things rebasing does not: 1) a point to rollback to if things don't work out, and 2) an explicit entry of when your branch was brought up to date with master.
When I am working on a feature branch, I often find myself committing experiments, that I possibly undo later in the branch. I don't always focus on making "incremental, atomic" commits, because I am usually focused on the code. When the feature is done (but before I merge it into master), I will usually go through and clean up my meanderings, and turn them into "incremental, atomic" commits, so that my teammates don't need to deal with my meanderings if they ever need to run a bisect over my code.
You may say I'm doing it wrong, and that's fine, but it works for me.
If you rebase aren't you destroying that history of experimentation? I feel like this is destroying the whole idea of a VCS as a safety net and making developers self-conscious about something that supposed to tolerant of mistakes.
Why do you care about the history of dead-end experiments polluting the history of the specific final features being implemented and incorporated into the mainline?
I see both sides of this, but on my own team, where we are all meant to be experts on the project, I really like to be able to see the experiments, because there's a reasonable chance I'll be trying something similar to or perhaps inspired by those throwaway experiments at some point. I think there is a different trade-off in open source projects, where it's more helpful to have a history that isn't confusing to newcomers.
I end up with a lot of "Interim commit on foobarbaz" because I'm leaving work and want to push my code just-in-case. Refolding those into "foobarbaz: feature 1" is a lot nicer.
The central repo is not meant as a place to backup your working tree. That's the point of git stash, and local branches. If you're worried about losing your working tree, back it up!
I think I wasn't clear. I like to be able to see completely experimental entirely thrown-away approaches in branches like you suggest, but I also like to see the little hints of partial experiments that a "dirty" history shows. My point is that on projects with focused teams where everybody is or should be an expert, literally the more history I can access, the better. Things like "did somebody already try and fail to refactor this class? What was there approach? Can I actually do better, or am I just headed down the same rabbit hole?" are invaluable to me, and the best way to answer those questions is to see "dirty" remnants of things people have tried and un-tried in the history. History can be forensics, and in forensics you don't want things to be "clean".
Exactly this. I don't mind leaving the history of experiments in if they lead fairly logically and cleanly to the final product, but sometimes I will have 2 or 3 "WIP" commits in a row, that turn out to be completely irrelevant to the final product.
As long as your final commits are logical you don't lose anything. You need clean commits on the history to be able to understand the code later on.
During code review at a later time, the history of the experimentation is useless once you find several commits that touch the same code before settling on a final version.
> You need clean commits on the history to be able to understand the code later on.
I think this whole debate hinges on peoples' view of that sentence. Sometimes clean history helps comprehensibility and sometimes it obscures things. I think the amount to which each is true varies author to author, reader to reader, and project to project.
> I think this whole debate hinges on peoples' view of that sentence.
Prescient observation. How does clean history obscure things though? You mean as failed experiments get removed? Important things ought to be mentioned in commit messages. Relevant things to document can be showcased like "Tried X but it turns out Y is better because Z." I often find that code alone is not enough to describe why something did or didn't work. One ends up having to explain in commit messages anyway.
> I often find that code alone is not enough to describe why something did or didn't work.
Me too. And I also often find that commit messages alone are not enough to describe why something did or didn't work. Code and commit messages both help.
"Clean history" can obscure things when it leaves out information about the often messy process of creating the software. It's impossible to know ahead of time what information will and will not be useful when attempting to grok a piece of code in the future, so sometimes it makes sense to err on the side of more information, instead of less.
> You need clean commits on the history to be able to understand the code later on.
Really? That seems like an extraordinarily obtuse way to understand code. I would think comments directly the source files would be more useful. Commit history shows how they arrived at that result and that's what I would rather see there.
What do you think is easier for the next guy?
1) three commits that do:
- a = 1;
- a = 7;
- a = 3;
or
2) one commit that says:
a = 3;
My point is that experimentation is slightly different from changing your mind about the whole implementation.
It is the same as writing your homework. You have a separate piece of paper where you make your experiments.
Not to mention that in practice, if this sort of history cleaning is forbidden then people will just attempt to not commit those first two lines, meaning that they are not getting the full benefit of version control.
When does the "next guy" ever look at revision history to see what's going on? I only ever look at the current state. If I want to see how it diverged from my last commit or the last commit before whatever milestone or release, I will diff the current version against that version ignoring every commit in between.
> When does the "next guy" ever look at revision history to see what's going on?
_All the time_. I read all the commits in the codebase I'm responsible for. I need to keep track of what people are doing and how the system is changing. This is also on top of the need for code review and ensuring each patch is correct.
I do that all the time with code that I don't understand. The commit message is the last resort of getting a human to talk to you when the comments are missing.
Not on most projects, no. We are on my current project and the review tool is based on tasks, not commits and can review multiple commits in a single session if they are all related to the same task. Regardless, I am only ever reviewing the end state.
Commit messages I write are often several paragraphs long (and sometimes that's for a one-line change). Including all of the information in comments is not always viable.
You have NEVER finished work on a feature and you look at the history and there are 10's of commits with crappy commit messages and extremely minor changes? If so, good for you. Not so for me. I don't always squash ALL commits on my feature branch, but I often remove a good number, so the history looks helpful for my future self.
You can always keep the un-rebased branch around if you want to preserve the history somewhere, and then squash the commits down to a smaller number on top of master. That solves the safety net problem, and master keeps its sanitized history.
This is what I do and it's kind of required when you're using e.g. gerrit code reviews.
Cleaning up your history before merging is important. For one, before you merge you should usually have someone do code review. No one wants to do a code review on a branch that has a bunch of false starts, typo fixups, debug print statements being added and removed, and so on. Code reviewing a branch that breaks something and then fixes it three commits later is a real pain; you sit there puzzling over the first commit, wondering how it could possibly work, and writing up a big explanation for why they need to fix something, then you go on to a later commit and realize they already fixed it.
Furthermore, dirty branches lose you a lot of the power that having a good, clean history gives you. When you do a blame on a line of code, to figure out when the last change was, do you want to see the "fix whitespace to match style guide" commit that someone insert in the branch at the end, or the actual meaningful change that occurred earlier? If you don't squash your commits to deal with these kinds of issues, you lose a lot of the power and convenience that good history gives you.
There's more. One of Git's most powerful tools is bisect, but even in a VCS without an automated bisect, doing it manually can be useful to (I've done this in SVN before). If you have a regression, but have no idea what caused it, it can be very useful to bisect your commits; find a known good version and a known bad version, then go to the commit halfway in between, test that, and depending on whether that commit is good or bad, test the one halfway between that and the known good or known bad commit. Keep doing this until you find the commit that broke your code. But this process is seriously impeded if you have a bunch of half-done commits that implement a part of a feature but break something else that's fixed up three commits later.
The "history of experimentation" nature of VCS history is just not all that interesting. Think of your VCS history more as an extended form of comments, that document why everything is the way it is. If you actually wrote comments on every line describing why you had changed it in a particular way every time you changed it, your code would wind up being more than 90% comments in not too long. Most of the time, you don't need to see this; but when you are left wondering "hmm, why is this the way it is?", good history is invaluable. The experimental changes in between aren't all that useful; if you got any information from them, then feel free to summarize that in the cleaned up commit message after you've squashed them out.
Now, that's not to say that you should always produce perfect history while working on a branch. Feel free, when you're in exploratory coding mode, to make lots of checkpoint commits, experiments, and so on. Just clean it up before you present it for review and merge. The nice thing about Git is that you have your own local branches that no one else ever has to see, clean things up quickly and easily with "git rebase -i", and present a much nicer history when it's ready for merge.
man, do I agree with you...
I'm new to git, our whole company is... Every time I have to go through history, it is one big mess with a lot of intermediate stuff. It is a pain to work with.
So stop committing non-workable intermediate stuff and finish what you're doing before committing. I fail to see how it's "a pain" to have a history of everything done.
If you want to mark new features or releases, use tags for that.
No, you should never be afraid of committing anything you have at any point in time. Git works as a development tool as well as a central VCS. As long as you have committed something, it will be restorable in case you overwrite it or delete it. Telling someone to wait before committing is a bad idea. They may get a lot of work done and then inadvertently lose it somehow, permanently. Instead, you should commit often and then use interactive rebase later to clean things up. You want to be able to have the freedom to switch branches, navigate history, and work on multiple features/bugfixes at the same time. You're restricting your ability to do these things if you wait too long to commit, and you're increasing the danger of losing your work.
There's no need to squash commits with rebase. Ever.
Whether you commit often or not does not change the fact that rebase is unnecessary to keep a clean history of features/releases and obscures real commit history.
You can have a clean history of features and/or releases with tags, without destroying commit history.
Explain to me how tags get me to an understandable view of my DAG so that I can see clearly what has been happening to the code, by whom, and why. Tags are just labels put on commits. How can I get a clean view of the history by feature? Do you put a release tag on every single bug fix and logical change that someone makes? Why would I go through the hassle of putting a tag at the tip of every single code reviewed chunk of changes? Why would I want all of these tag names cluttering my git log alias that shows me the history? How are tags going to compensate for the endless bubbles of "merged master into master" that inevitably clutter up the graph when people don't bother to rebase? How do you tell git bisect to skip all the intermediate bullshit meandering commits between the countless tags?
> Explain to me how tags get me to an understandable view of my DAG so that I can see clearly what has been happening to the code, by whom, and why.
That's what the commit history is for. If you don't like seeing merges use git log --no-merges. You can use rebase to avoid seeing merge commits, but it's awfully unnecessary with the nasty side-effect of destroying history.
I was suggesting tags as way to keep an alternate history of features or releases. Features can be developed in separate branches for them, but you could tag features when you merge them in if you want an easy history of feature merges. You can list tags by date, use prefix's for sorting, etc.
The history of tags happens at the release level. That is not granular enough. The history of every last little typo fix is too granular; it's just worthless to preserve. Using tags for every merge isn't all that useful; you already have the merge commits for it.
What you want is a logical sequence of correct changes (or, as correct as anyone could tell at the time; of course no one's perfect).
If you you have to do code review, track down a bug by bisecting a commit history, or figure out what patches from one branch need to be ported to another, you want to have good history. False starts and fixes to typos from previous patches have no value; in fact, they have negative value, as they obscure the interesting information that a good history provides.
Cleaning up history really doesn't take that long. When something is about ready to merge, take a quick look through the history to figure out which patches are redundant or logically belong as part of previous patches, do a "git rebase -i", and squash them into the appropriate patches. In the process, make sure your commit message are actually good enough that someone doing a code review can actually follow what you're doing (no "fixed a bug in this function; fixed a bug in that function"; actually explain what you fixed and why your fix is the right one).
What do you mean that's what the commit history is for? That's what a DAG is; that's what I'm talking about. You know what the DAG is, right? I don't want to exclude all merge commits when looking at the DAG. I merely don't want to see all of your "merged master into master" bubbles because you can't be bothered to clean up a bit and rebase before pushing your changes.
I don't know what you're going on about with this "destroying history" as if the sequence of your little typo mistakes are some kind of precious documentary that needs to be preserved in case some forensic expert wants to trace every step you made along the process of adding a widget. You might as well go find a system that records and tracks every key you type, because after all, every time you hit the backspace key, you are destroying history.
Tags do not keep alternate histories. They are simply labels on commits. You use them to mark certain commits as releases, you do not use them to track every logical change to the codebase. They are used sparingly to track the occasional version number bump as a result of a sufficiently large number of changes. These version tags do not provide the granularity I need when I look to see what is happening on a single branch at any point in time. To add them to every non-trivial commit as a way of distinguishing them from the just-dicking-around commits would be ludicrous.
edit: One more thing. I think it is absolutely silly to say in one comment "stop committing non-workable intermediate stuff and finish what you're doing before committing" and then turn around in another comment and talk about how rebase has a "nasty side-effect of destroying history". You do realize that all the editing and polishing you're doing before you make your commit is the same type of destroying history that would happen if you made small, incremental commits and then cleaned them up with rebase, right? The only difference is that your way is way more dangerous as far as losing history is concerned, and you're not taking advantage of any of the benefits of Git in the process.
Squashing is not the purpose of rebase. Rebase allows you to clean up history. Sometimes, that means _separating_ large commits into smaller, atomic ones. Sometimes that means re-ordering things to make more sense for the reader. And yes, sometimes, an atomic unit requires squashing two or more commits together.
Commits should be logical units of the codebase, not units of developer productivity over time.
You can easily squash those experimental commits and have "incremental, atomic" commits in history. What it gives you is the freedom with a clean slate after each commit. And no stashing doesn't work because the next experiment might depend on the last one. Not happy? Interactively rebase HEAD~n and get rid of all the experimental stuff. Changed your mind? Git reflog is your friend.
> When you do a blame on a line of code, to figure out when the last change was, do you want to see the "fix whitespace to match style guide" commit that someone insert in the branch at the end, or the actual meaningful change that occurred earlier?
git blame -w # works with git diff and git show too
(You might also be interested in --word-diff=color for git diff and git show)
Well, why not a concept of 'soft' and 'hard' commits (or sub commits, or major and minor)? Let people do what they must, let the logic behind it stand, and give a nice clean history by ignoring the soft commits unless you explicitly access them?
>When I am working on a feature branch, I often find myself committing experiments,
That's exactly what `git branch` is for. For experimentation.
>that I possibly undo later in the branch.
That's exactly what `git branch -D` is for. Or even just `git checkout` and leave the branch there. That way if you ever change your mind you can revisit it.
Imagine we are both working on a project. I don't care to know that you merged 3 times from master yesterday before pushing your feature. Also I don't care to know details like you forgot to put a config file in your first commit and had to do a second one, or that it took you 3 commits to have the spelling alright in the UI.
Mainly that information is useful to you. It is also mostly only usable efficiently by you ( I will read your whole feature, most likely I will not be able to rollback to the middle of your change )
For example I commit several time an hour - my coworkers would be pissed if I make 10 commits for each minor feature I develop.
I'm incredibly new to Git, but actually destroying the history seems like a crude solution for a sophisticated tool like Git.
Couldn't there be some way to just tag the "main" commits and mark the dead ends as "extraneous" rather than destroying them? And then have your history-viewing tool hide/squash the unmarked "invisible" commits by default and only expose them when specifically requested?
I mean, it seems to make more sense to just look at a blind alley of commits you made and just flag them all as a mistake rather than actually rearranging the DAG.
This! I have been thinking for a while that this whole weird business about rebasing and 'clean' history is really a response to a shortcoming in our tools, which don't give us a way to distinguish from 'small' incremental, work-in-progress, historical-record-but-not-that-interesting commits, and 'big' significant, feature-completing commits.
In Mercurial, you can kindasorta have this if you do all your development on named branches, and only merge the named branches back into the default branch at these significant moments. Then, you can merge willy-nilly, without rebasing or otherwise destroying or lying about history, and distinguish ignorable work-in-progress merges from significant feature-complete merges by which branch they were on. Most query commands let you filter by branch, so you can easily do that.
For those not familiar with Mercurial, the difference that allows this is that Mercurial permanently records the name of the branch a commit was added to. That means there is an observable difference between merging A into B and merging B into A. This is not universally agreed to be a good feature, but it does allow this particular approach.
Then you just have to choose between having a single shared development branch, a branch per developer, a branch per story, a branch per task, etc, and come up with a coping strategy for any resulting proliferation of branches.
You can do this with git tags. Simply tag each release or feature when you merge it in. The entire point of source control is a history of changes, not as a changelog of features added/removed.
> Couldn't there be some way to just tag the "main" commits and mark the dead ends as "extraneous" rather than destroying them? And then have your history-viewing tool hide/squash the unmarked "invisible" commits by default and only expose them when specifically requested?
You could do exactly that; look into git-update-ref for how you could implement that so that garbage collection doesn't wipe out those dead ends (git-notes basically does what you would need to do).
Note that you would still be rewriting history, still rearranging the DAG, but you would have references to the old states. Basically like a permanent reflog, though perhaps with an interface tailored to this usage.
Just wanted to point out that rebasing is not just to remove or squash commits. I use it to:
- _separate_ large commits into atomic, logical units
- fixup changes missed the first time around
- rewrite commit messages to ensure they're clear
- reorder commits
It could combine all consecutive non-merge commits; or, if you're rigidly adhering to the "never directly commit" model for master, collapsing everything down to just the merge commits would be enough.
I silence the noise before merging on a public branch by squashing the "thinking commits" on my private branch. What remains is a clean history of commits. You will not find the oscillating commits on the public branch but you will find them on my private (local) branch.
So you do commit early and often, just that nobody else sees your commits until the feature is complete and working. And then only after you rearranged and squashed your commits. From the outside it looks like you always commit top quality code from the first try.
Besides the official repository, we have per-user backed-up repositories for this reason. People develop on their machines and publish/save on the git server finished/unfinished work. You can rebase at will and push --force as much as you want.
Rebases on the official repo are not allowed and the per-user repos are public and can be used also for collaboration.
Do they also force you to commit unfinished work frequently? Because in my experience devs would get around that "all branches are public" rule by just not putting things into a commit until they were confident something was complete. The end result is the same, except that devs don't get to enjoy using version control to its full extent.
Unfinished work is committed somewhat often, especially if it's complex. Commits are sometimes squashed once a pull request is approved (PRs being the code review gateway), more often if there are a lot of uninteresting commits.
Some people use git add -i to be very selective about what they commit, and deliberately increase the number of independent commits, rather than have a single commit that has unrelated changes in it.
Other people have a lot of WIP commits.
We started out doing more rebasing, but do a lot less now, after having run into issues with having e.g. branched off of a branch that has since been rebased and merged (e.g. front end / back end feature split). You try to rebase, but the replay of commits continuously hits merge conflicts, and you spend about 30 minutes repeatedly fixing the same conflicts. And then there's the risk of your force push accidentally chopping off someone else's commit (though that's never happened).
We had one PITA case where a profanity was checked into the codebase, and branches with the offending commit in the history lurked in various places surprisingly long after the commit had been excised from the main trunk lines. Since we're a startup in the financial sector, our code will be in escrow situations, potentially examined by humorless auditors, we don't want profanities in.
How do you enforce that with Git? Any time you create a branch, it's local until you push it.
But anyhow, just give people private repos on the server. What I do is push my private WIP branches to my home directory on the server, and once it's ready for code review and merge, push it to the central repository.
Because clean history is easier to bisect. It's easier to visually track problems, (aha so you changed thing A in branch br12 and thing B in branch br13 as opposed to wait so person A branched into br12, then person C branched into br23, then it got merged with br74, which was merged with person D on branch br84..).
Less entagled workflow is easier to untangle and consequently understand even if it hides some stuff.
Also it's visual clutter. If your history looks like train map, your project is probably a train wreck.
This is an artificial problem. Git has almost all the data needed to present a squashed view - all it's missing is an idea of what commit a branch pointed to throughout history, so that it can merge together commits in in the "bubbles" for presentational purposes.
It would be better to fix this, so you get nice diffs, blame etc., than deal with all the other issues rebase causes.
I'm not convinced on the bisect issue either. If your app is trivial, sure, but if the features are more complex, the squashed commits will be too chunky to narrow down as usefully as a fuller history can.
Again rebase is there to cleanup your work, to simplify your workflow so someone not necessarily something looking the code can make sense of it.
git bisect might work ok (I haven't used it extensively) but the more important is if you made a mistake and call someone to help, to ease that person's work by presenting a trimmed though more understandable tree. You should strive to keep the branching as simple as possible, but not simpler than that (to paraphrase an infinitely more smarter man).
I'm not convinced on the bisect issue either. I've personally done a bisect spanning 10000 changesets with 50 concurrent branches at the widest point. Found the problem right away.
We follow a similar branching model. Master is kept clean. We do all development on feature branches and merge when complete. History is fully preserved. History is a mess but it works well overall.
I would argue that you want to resolve conflicts in a rebase instead of a merge. This allows you to see where the branches diverge and fix it right away, closer to the offending commits, Preventing more work from piling up around the conflict making it harder to piece what's going on.
It can mean that you fix conflicts closer to the point where they first happened, or it can mean that you have to fix the conflict in a bunch of different ways as it propagates through each successive commit. I find myself trying to outsmart the conflicts: "Ok, I know I changed such-and-such in such-and-such a way here in a few commits, so if I fix this conflict like this, then hopefully I won't have to fix that commit later..."
While this is often true, `git rebase` changes your commits. Your history is now a lie. I've been in the situation where a `git rebase` ended up breaking commits. It's possible to merge broken history with `git rebase`. Only a `git merge --no-commit` will give you the opportunity to tweak the merge commit so that the resulting merge isn't broken.
Time is relative and history is only what has been observed. You only rebase code that has not been seen by others, so the code that was rebased was never history at all, as far as anybody else is concerned.
I don't think anybody ever recommends rebasing public code. There is a reason for that --force flag on git-push advises the user to use with care.
I mean, I could configure my development setup to automatically commit my persistent undo files so that every single keystroke I make is preserved for prosperity... but I don't do that of course. I could also configure my setup so that every time I write out a file it commits, preserving that history forever... but I don't of course. Why should a bunch of temporary local commits be preserved forever?
Using the terminology "rewriting history" to describe rebasing local commits is misleading. "deciding what history will be" is more accurate.
> This very article says it's acceptable to rebase a public feature branch.
# optional: feel free to rebase within your feature branch at will.
# ok to rebase after pushing if your team can handle it!
It says that rebasing a feature branch is fine if it hasn't been pushed (ie, if it is not public). If it has been, then it is only okay to rebase it if everybody else on your team okays it, which is just common sense. If nobody cares then... nobody cares.
The problem with the article is that it's wording is imprecise.
Yes, you could say that if you wanted to set up a straw man that is shockingly close to "we don't need to use version control because our code never has problems".
Much in the same way visiting a museum is much more useful for understanding the past than visiting your grandma's attic, I think a curated history is much more useful than an accurate history. When someone says a clean history, I think they're saying a well-curated history.
You don't submit your first draft almost anywhere else, why do you think nailed it the first time writing your commits? Sometimes you don't get things right, and rebasing is one of the tools that helps you make sure that the written record of your work is helpful.
Wow! I love this analogy! Allow me to disagree with you inside the premise of your own analogy. Visiting a museum is a much more useful way to understand past world history, but visiting your grandma's attic is a much more useful way to understand the little twists and quirks of your own family. In much the same way, I think a curated history is very nice for open source projects with lots of committers and constant newcomers, while a history that preserves foibles can be more useful for smaller and tighter teams. It is often very enlightening to see the missteps that were taken on the road to the eventual solution to a problem.
Well, that analogy is OK but there is another analogy which comes to mind: keeping track of everything you do may have very useful future side-effects. As any lab researcher will tell you, keeping a strict log of anything you do is the way to both knowledge and reproducibility.
So git history is not necessarily "human history" but "engineering history" and as such, may be much more important than you think and "curating" it may be a mistake.
Because no one cares about an individual's doodles and false starts on a feature branch, or an exploratory branch off a feature branch, they only care about the final difference between before and after merge. Explorations are an unnecessary distraction.
I think phrases like "no one" should be understood as no one modulo a small number of exceptions that does not noticeably affect the majority trend. Always.
Rebasing on a public branch is a big no-no. But rebasing on your private branch helps you catch merge problems early on.
Except for the "merge --no-ff" I'm using exactly this model and I think it is great. It can't get any more simple than that and still have a working master.
Regarding the "straight line, clean" history, I've found that most people think that a straight line is "the" history to have. I have no idea what to tell them.
I don't think a feature branch implies collaboration.
Even if more than one developer is involved, the model doesn't change: "master and feature" branches become "feature and developer-private-feature". The developer still relies on a (partial) working feature branch, and his contribution still has to be self-contained.
But since you have more developers you will have an out of band sync communication between them.
It's pretty simple... readability triumphs. It's a hell of a lot easier examining the history of a project if a feature is condensed into a single or a few easy to read commits. Your personal struggle to implement a feature is far less important than having a clear, readable history. This becomes super important when bisecting, reverting, generating release notes, and a myriad of other things.
It's pretty simple... You can maintain an accurate history of commits while using tags to have a clean list of features and/or releases. A clean history of features or releases does not require rebasing.
I don't understand why people don't realize there is a place for rebasing and a time for merging.
untothebreach explains it well [1]. When you are working on changes which are not public yet, sometimes it's helpful to rebase them into a single or fewer number of commits. Period.
No one is arguing to always use rebase and never use merge.
If you want a completely accurate history I hope you have your editor set to commit after every character you type, in case you need to delete a typo.
Ok, thats slightly tongue in cheek, but I think its a subtle balance between clean and accurate history, some people are happy to use rebase to, some are not. In my view, rebasing is no different to using undo in your editor, but I appreciate some people feel differently.
I don't understand why people think rebasing destroys history. It doesn't "destroy" anything.
Have you ever had an argument that was just mediocre? But later you think of a witty retort that would have been just perfect. Thats what rebasing is. Its re-structuring the conversation the way you would have liked it to go.
Except that conversation was actually a monologue which you haven't actually performed publicly yet. Rebasing private branches is like editing a speech before you give it: just common sense.
If you restructured the "I Have a Dream" speech, you'd be hacking up history. If Martin Luther King Jr restructured the speech prior to August 28, 1963, as I am sure he did many times, history remains untarnished.
Rebasing feature branches isn't that bad -- it's similar to the workflow you might use when submitting patches to a mailing list or other patch queue: the patch floats on top of master and eventually gets applied.
It's the rebase-and-fast-forward merge strategy that causes problems. :)
Public history and private history. Or maybe changes to production vs changes as you developed.
I care more about the merge then the commits. (In github parlance, the Pull Request) This workflow turns commits throughout time into a single change against the master (while still preserving the commit history). Conceptually this is a lot easier for me to see what's changing, and to deal with issues. So I guess I disagree with point #1)
In a continues delivery environment this workflow makes sense because the log accurately shows the history of the changes to the app in production instead of the development environments. (Merge, Merge, Merge)
#2 doesn't make sense to me, if you rebase you're up to date with master as of the branching commit.
You can have both. Use tags for new features and releases, and that way you can have a clean history of featured added while maintaining an accurate commit history. This is what tags are for: Marking an important commit such as a feature or release.
Interestingly enough, most folks working on GitHub.com don't use this model. We actually use a simpler model, and usually merge to our feature branches rather than rebase. I'm not sure if Zach's latest talk(s) goes into this level of detail.
I think a big part of the reasoning is because we tend to push up branches really early to open PR's and get discussion going. And of course rebasing public branches generally leads to hell.
I know some other .com devs will rebase privately before pushing a large branch, but I would say 80% of work is just done with merging.
I think it depends on how public and how many contributors you have to a feature branch. I think author has the assumption that there is typically one dev per feature branch.
Once a feature branch is being worked on by multiple devs (and hence multiple feature branches forked off), it is a public branch and should use a merge based workflow.[0]
I personally use a rebase workflow on private branches before merging since it makes for a cleaner history. I've seen devs merge a branch with 100+ merge commits and it absolutely destroys git history.
This is exact same workflow I use. If it's a feature branch that I'm working on locally, then rebase -i is my friend as I can squash commits. But, I rarely stay in a local branch for longer than a day or two for fear of losing work and no developer is an island. The second it is shared, it's merge only.
Rebase conflicts always cause more grief than it's worth.
We used a very similar model to this at my last job, and I'm struggling to get my current team on board with this type of process. I think the main problem is that people don't trust continuously deploying master because there aren't enough tests. In my ideal world, every commit is tested (with Jenkins, Travis, Buildbot, etc), and then if the PR includes tests for the code and the build passes, the reviewer says LGTM and the committer presses the merge button on GitHub. Once the button is pushed, a build of master is triggered. If the build passes, the code is automatically deployed.
My world really changed once I started working with code bases that had excellent test coverage from the get-go.
At my last shop we combined that with pair programming, feature switches, and a few other tricks, and we basically never branched. You'd pull, work for a few hours, push, and 10 minutes later your code would be live. It was in one sense freeing: the release overhead of other shops was gone. And in another, it inspired more discipline. Knowing that everything you were writing would shortly be live kept you on your toes. You could never leave something for later; there was no later. I loved it.
If I understand your question rightly, yes. Looking at the Github history, we did actually have 6 branches over the life of the project. All were extended technical experiments of one sort or another like trying out a new templating approach. 2/6 were merged. But all normal coding was pushed to master with no branching (beyond a local checkout and local commits, which are a sort of branching, but none of those lasted longer than a day). There, any checkin triggered tests, and any build that passed the test was pushed live.
If you want others to see the benefit, I'd encourage you to pick a specific area of the code, test the hell out of it, and make sure that a) tests are easy and quick to run on dev boxes, and b) every checkin is automatically tested. I'd start small, and one good place is a chunk of important business logic. It's even better if you use the tests to support refactoring and general cleanup of an area people know is messy.
If you do this right, then people will have two experiences coding on the project. In the tested code, it's pleasant and safe. In the messy code, it feels dangerous and scary. Over time they may get it.
Note that this is really hard to get off the ground in an established company and in an existing code base. So if they don't catch on, don't feel like it's you. (I generally cheat by being the first person on greenfield projects, so the first line of code written is a line of test code.) Also, if you get stuck while trying to clean up legacy code to make it testable, Michael Feathers' book "Working Effectively with Legacy Code" is very helpful.
Good luck, and feel free to drop me an email if you end up with more questions.
> If you want others to see the benefit, I'd encourage you to pick a specific area of the code, test the hell out of it, and make sure that a) tests are easy and quick to run on dev boxes, and b) every checkin is automatically tested. I'd start small, and one good place is a chunk of important business logic. It's even better if you use the tests to support refactoring and general cleanup of an area people know is messy.
Ah, this is really good advice, thanks. I'll give it a shot.
> In my ideal world, every commit is tested (with Jenkins, Travis, Buildbot, etc), and then if the PR includes tests for the code and the build passes, the reviewer says LGTM and the committer presses the merge button on GitHub. Once the button is pushed, a build of master is triggered. If the build passes, the code is automatically deployed.
This isn't an unattainable utopia. It's what lots of teams are doing now. Try out continuous integration services (Travis or Koality).
And, given you have good instincts, ping me when you're looking for a new team ;)
I would also add "commit often, squash later". I find frequent local commits useful for quickly rolling back mistakes but they'd just clutter the main history if they got there. Usually if a commit is important enough to end up in master, it is also important enough to do the merge so most of my changes are actually 1 commit (2 if you count the --no-ff merge.)
Learning git here. Could you please explain how rebase squashes commit ? Looking at man git-rebase it seam that it detaches a branch and attaches it to the current branch. From the documentation the chain of commits is preserved and simply moved in the graph. The sequence of commit nodes of the branch are not merged into one commit node.
Where n is number of commits from HEAD, you can do ANYTHING to your branch. You can stop mid rebase to add some files that didn't exist before, squash, fix, execute commands, etc.
You can even change order of commits and delete commits from history. BE VERY, VERY CAREFUL!
In addition to the other responses (i.e. 'rebase -i'), there's also the shortcut of passing the '--squash' option to the final 'merge' command. This, unlike the standard 'merge' does not create a commit but instead applies all the commits from the merged branch into your working tree. You can then review the changes and create the commit yourself.
If you work with other people, this model doesn't work so well, IME.
In particular, rebasing is very hazard-prone if someone else may have checked out your branch.
If you're working on a feature that needs changes in multiple components and is broken without coordination, you may be working off the same branch, or have separate branches with inter-merges. Either way, rebasing will cause trouble.
In particular, rebasing is very hazard-prone if someone else may have checked out your branch.
That's why you never rebase a published branch. This is easier to manage if your team uses a central repository as the 'official' repository and everybody pushes and pulls to that one. Then you know that your branches in your local repository are not public, not published, and safe for history-altering workflows like rebase. It's only the branches that you push that you're not allowed to rebase, at least not prior to the last push.
If your team doesn't use a central repository, and you're pushing and pulling between all of your local repositories, then I'd suggest using a naming convention for branches to distinguish public ones from private ones. Eg: start your private branch names with "DEV_". If someone else pulls a DEV_* branch, then they should expect that its history might change and they'll have to deal with that when it occurs.
My problem with that is my coworkers and I usually work on our own feature branch. Ideally I wouldn't push mine to the central repository until I'm done, so I can rebase on master without changing the public history, but at the end of the day I don't like to leave code only on my computer (what if my hard drive blows up??) so I push everything I don't want to lose.
Git is designed so that branching is cheap. Why not create your own development branch off of the feature branch? Then use the feature branch as an integration-only branch?
My team of 6 handles this just fine, we try not to have many people on the same branch, nor do we like to have branches that live very long. Merges of branches that are old tend to cause more problems. Problems that would have fallen out on the developer side if they had rebased.
I'm using this model in a team of 12, it works fine, except for: 1. needing to have reliable testing environments for QA/product people to try features, as many feature branches are worked on at once; and 2. new people having trouble with rebasing, or have to understand git just a little bit better than pull/push/commit.
> 1. needing to have reliable testing environments for QA/product people to try features, as many feature branches are worked on at once
Makes sense. Also, try using continuous integration (which should test every commit), and try teaching people to check out particular commits (by hash). I find that once people understand what "headless" means, and how to get out of it, `git checkout <commit hash>` is a very useful way to test _specific_ things.
> 2. new people having trouble with rebasing, or have to understand git just a little bit better than pull/push/commit.
I think the lesson here is that spending a bit of extra time teaching your developers (not to panic,) how git works, and how more obscure commands work, goes a long way.
> nor do we like to have branches that live very long. Merges of branches that are old tend to cause more problems. Problems that would have fallen out on the developer side if they had rebased.
This is a great point. Keeping branches small avoids serious integration problems.
It works well if you keep changes small. Basically if you're making a change that integrates with some other place that someone else needs to update, that looks like two separate changes to me. Of course you can do big feature branches that change a lot of things at the same time, but you're right - that won't work here.
Regarding big projects -> gerrit works pretty much the same way (but also forces you to squash changes into a single commit). This is used successfully by cyanogen and openstack people at least. Those are fairly big teams.
> gerrit works pretty much the same way (but also forces you to squash changes into a single commit)
That may explain why one of the committers I work with on an open source project heavily suggests squashing changes into a single commit when merging topic branches into master. He works with gerrit at his day job!
Personally, I think it's better to only squash experimental commits or minor stuff like removing a newline, and to keep development of a feature spread out over several commits if it makes sense. Makes it easier to revert back to a specific commit.
If people pull from a random github branch I think it is their problem.
They should pull only from what you declare as public branches with stable history.
I keep on github only the master branch and publish the feature branch only when it is ready. After the maintainer merges the feature branch, I delete it from github.
I am, of course, talking about private GitHub repos used within a single company, and not about random people pulling down branches, but rather team development.
My point is that people should be branching from stable branches.
From your explanations, I understand your scenario is this: a private (shared by a team) repository on github. You work on your PC and then push on the shared repository on github.
If this is the case, you should push only when the branch is stable. If people really need those branches and you rebase them, you jut make it hard for them.
Either you stop rebasing what they consider public branches or you reconsider what are your public branches.
It depends on the complexity of the app as well, and how much vertical integration there is, and what kind of development pressures there are (in a startup, the deadlines can be very short).
Rebase is good for when you need to rewrite or clean-up history. For example: all those "WIP" commits you'll frequently see aren't exactly helpful. If you want to rewrite history of a branch that others are actively working on, well then you're going to have A Bad Time™.
In my team we do something superficially similar, but instead of rebasing we just merge changes from master into our feature branches whenever master is updated. This seems to result in fewer conflicts for us, despite what you might expect.
Also, when the feature branch is to be merged into master we do a squashed commit so that all changes from that branch show up as one commit in the main project history. The feature branch's commit history is preserved in the repository (thought not in the master branch), so it's not really any more difficult to roll back partial changes.
Our situation is likely different from many projects though, as we only ever have one developer working in a given feature branch.
> The feature branch's commit history is preserved in the repository (thought not in the master branch)
This would require you not to get rid of the branches (remotely and locally), right? GitHub does allow you to undo the deletion of a branch, but is that only for a certain time period?
I like to delete my branches as soon as they've been merged in.
We leave the feature branch on the remote repo indefinitely, but we really don't need to do so. The diff between the feature branch's squashed commit and the previous commit of the project usually tells us everything we need to know when a problem crops up. We keep the feature branches "just in case", but in practice they're never used once the branch has been merged into production.
FWIW, this branching model works for me in a small-ish team, two years now and counting. The part about `good merge bubbles' is especially important for clean history.
If you're looking for an automated tool to do this, take a look at gerrit. I used it successfully at my last day job. It basically enforces this model. The quality of our commit history changed quite dramatically.
In a similar model, I just commit hotfixes on the master branch. What is the negative implication of this? If I had a branch for them, those would just be merged back immediately with one commit anyway.
No rule is sacred, and your mileage will vary. For me, having a pull-request for the hotfix helps us run continuous integration tests, code-review, and ensure relevant people get notified automatically of the fix.
I agree with most of what is suggested here, except forking. I think that even for small teams forks are the way to go: you get a cleaner upstream, plus you get a backup of your local repo.
What if you pushed your feature branch and then need to fix something in that branch later? How do you handle this case as you shouldn't rebase a pushed branch?
It's fine to force push to a remote branch if it's a personal branch, that is, no one else is working on it but you.
(Or it's fine if you've previously discussed with your team the implications of force pushing to shared branches, and they're okay with that because you all know what you're doing.)
Maybe it's just one point of view, rather than a clear-cut, 'this is how you do git.'
For example, my point of view is that master should never contain development work. master should always be stable. In this case, the post here does not jive with that point of view.
You can use tags to present a clean history of features added or releases, while still preserving the actual history of commits... Using rebase to clean-up (destroy) your history is an amateur solution to having a proper branching/tagging workflow.
Rebasing destroys historical information. I can't really see any advantages of rebasing when a merge does the same thing but leaves two things rebasing does not: 1) a point to rollback to if things don't work out, and 2) an explicit entry of when your branch was brought up to date with master.