I really recommend that this be a companion piece to Linus Torvald's own advice, here:
People can (and probably should) rebase their _private_ trees (their own
work). That's a _cleanup_. But never other peoples code.
That's a "destroy history"
Exactly. The kernel community has lots of wisdom built up around rebase vs. merge, and the way I think about it is, imagine a tree of developers with Linus at the top, maintainers in the middle, and leaf nodes at the bottom.
- leaf nodes can and should rebase to ensure whatever they submit is "bisect clean"
- everyone else should never rebase because you'll destroy history
As a leaf node, you should always be committing and submitting the cleanest patches possible. The prime directive is "do not break bisect". There's much more wisdom in:
stgit is a nice tool to manage your patches as a leaf node. It lets you go back and clean up individual patches in your history so that everything you submit is clean.
Maintainers don't rebase, period.
And recently (past 18 months or so), Linus has been yelling at them to not merge from mainline back into their proposed branch before pull request. The rationale is that when you do so, your pull request is now based on untested code.
Most dev communities will never scale as large as lkml, so not all the lkml conventions make perfect sense to adopt. However, like I said, there is still a lot to be learned from the community that's been using git the longest and pushing it the hardest.
Evolve for Mercurial (starting to show up in core mercurial) is trying to cleanly solve this.
It provides (safe) mutable history for local changes (the VCS tracks what is safe and what isn't), and a way to still be able to collaborate on changes without "surprising" the people who pull from you.
The vision was to have phase as the foundation for evolve.
We had some discussions as far back as mid-2010, at the time the feature was called LiquidHg. The first step was phase (differentiate between changeset states), and now it's evolve (make it flow).
Meanwhile I have had to use git for years to get around this lack in mercurial. I can't see what incentive I will have to switch to mercurial's tool when it is fully cooked.
This feature looks nice. I imagine its a bear to safely do :)
I always wished that darcs was usable "in the real world", as the concept of composability is very appealing. Perhaps there is a compromise to be reached here (local patchs with key sequence points?)
I think it looks like a trap, frankly. Give me a good definition for "local". If I mail a patch to someone and they apply it to their tree, what breaks and for whom if one of us then "publishes" it?
I'm sure there's an answer to that question, but I'm equally certain it's not the only question. It's not that git solves all this stuff perfectly, or that it does it in the best way possible, it's that lots of problems in SCM are human communication issues that simply cannot be solved by software. One of git's virtues is that it knows its own limitations.
I was referring to Python doctests (documentation narrative with interspersed Python statements and embedded reference output).
The text looked to me like one could extract all the commands, run them and check their output against the embedded reference output automatically. I don't know, it had such a rigid structure.
On a second look, the output contains timestamps, so what looks like shell output is probably only shell output the author has copied once into their text.
Please keep your comments in place, especially when you receive feedback. If you couldn't contact the person you were trying to give advice to, I suggest you write a post about it rather than inject a piece of text only to delete it an hour later. Your original aim of not cluttering the thread is, well, not that apparent now.
Honestly, this type of pedantry is pointless. We all understood what he meant. English is an evolving language, and many of the things that are "correct" today were similarly "incorrect" yesterday.
Quite. This kind of attachment to the peculiarities of Standard English rarely matters outside highly formal settings - it is as much an enemy of effective communication as its servant.
Example of advise used as a noun, a snowclone most of us have probably encountered: An X's advise to his/her Y (based on the eponymously authored 1811 "Lord Chesterfield's Advise to His Son"), which loses its impact in print if you don't use the not-very-archaic spelling.
Isaac has some controversial views at times (semicolon discussion anyone? ;) but here I fully agree with him.
When working on a very long and complicated feature on an unknown terrain, I commit "wip"s very frequently and rebase from time to time. I also frequently fork branches (from "foo-wip-1" to "foo-wip-2") before rebasing just in case I ever wanted to look at my messy history. When everything's done and thoroughly tested I delete a batch of branches.
This might sound like an overkill but it costs you literally nothing and might be useful sometimes.
The most valuable feature from rebase to me is, that the one who commits to a feature branch, also "merges" the branch to the current master. So right after you finish your work on a branch, you can rebase even before having it reviewed by another person. When it's good to go, the actual merge is fast-forward.
That's exactly it. In one sense, it's just a better, less annoying form of merging. You get to sequence your commits on top of the current upstream (literally re-basing) without a merge commit that makes little sense in context. Then the ff-merge to master is also clean. I don't understand why anyone would be against this.
It's not that anyone is against it when it's possible to do it cleanly. But there are cases when that isn't true. Sometimes the branch is not ephemeral, so rewriting it becomes messy and risky.
I frequently find myself fixing or extending some open source project that my own systems depend on. Usually upstream will eventually take the patch, sometimes they might not. But I can't wait around to find out -- my own "master" branch gets deployed and lives a life of its own.
When upstream does get around the merging the changes, and I pull from upstream, the merge is 100% safe and clean because git sees my own commits coming back again. But if instead the changes were rebased, the chain of history is broken and git needs to go into conflict resolution mode. If I've changed other things since, I get a mess.
All of this gets even worse if you have things like continuous integration servers and git-based deploys. Those systems will happily deal with merges and break when you go and rebase a branch they've already seen.
Actually, I think the article's point was that big feature branches should be merged --no-ff so that they would not be a fast forward commit. That way you can whack them out of master easily if needed. You can still rebase right before the merge -- I would.
I'd like to take this post out of context and comment on the HN-submitted title alone. For me, it perfectly describes what i love about git and what i hate about git.
Git is indeed an editor, but it's like edlin.
It's not Vim or Textmate or Excel. I want a Git that works like an editor, not requiring me to keep in my head (or in some shell bells and whistles) 'where i am', and what I can or should do now. Not a UI frontend with buttons for single-shot editing commands, but a real editor, that makes use of 2-dimensional space, to somehow really edit everything that matters to me about versions and sharing and changes and all that.
It's a really powerful metaphor. Microsoft did something very right when they released Windows Explorer with Win95, which treats your harddrive like it's a document that you're editing (i bet it's a ripoff, but that's not the point here). Copy, paste, undo, it's all there.
Can't we design something similar for distributed version control? How well would document-editing metaphors map to git? My underbelly feeling says that it might be a surprisingly good match, given the right visualization and editing primitives.
I really don't understand the advocacy of git bisect. Using git bisect boils down to figuring out at which point in the history a change happened that broke a test or introduced a bug. Having tried git-bisect five or six times, I've found it's way slower for me than the tried and true method of:
1. Something's broken. What part of the code is broken? (perhaps 5 minutes tops?)
2. git blame <file> (10 seconds)
3. which lines in the trouble area were changed recently (10 seconds)
4. git show <commit> to reveal what got changed in that commit and caused the problem
In 99% of these cases the git blame is just to see what the other programmer (or often myself) was trying to do at the time they broke the code -- in these cases it's obvious what's broken, just not why it was changed.
When it's nontrivial to figure out where the code is broken and I have no clue at which point in the history the code broke, it's still easier just to diff the broken code to a known good branch or commit and look for significant differences.
I guess where git bisect slows way down for me is that you have to devise code that will indicate definitively that the bug exists. It's really never faster for me than just eyeballing the troublesome code at that revision.
If nothing else, git bisect allows any user[0] to find when a bug they are experiencing was introduced to the codebase.
They do not need to be familiar with the code, the language, that module, the library, the developer or have anything more than the knowledge of how to check out/bisect the code base, and how to reproduce the bug.
For this it is an invaluable tool, and I can't count the number of times on the git mailing list that a simple bug report has allowed a developer to quickly locate the exact commit an issue was introduced and investigate further. A recent example is at [1].
[0] Here a user refers to any of 'technical user', 'developer', 'maintainer', 'original author', or really anyone who has the ability to check out the repository.
This idea can be extended even further, to users who do not know how to (or cannot) build the software in question.
One example of this is mozregressionfinder [1] that was created by Heather Arthur [2]. The tool automatically downloads Firefox nightly builds (in the same binary search pattern of bisect) and lets the user check for the bug they observed in each version. Once the nightly build where the bug was introduced is found, a Mercurial pushlog URL is displayed which can then be pasted into a bug report to aid developers in chasing down the bug.
If you work on a small code base, or if you know the code base perfectly, you might get away with your "tried and true method" most of the time.
But if you are working on a code base with hundreds of commits per month, I wish you a lot of luck. Your 10s of seconds might easily become days.
On a big codebase you might even want to automate bisecting as much as possible. See http://lwn.net/Articles/317154/ where Ingo Molnar says:
>>> for example git-bisect was godsent. I remember that years ago bisection of a bug was a very [laborious] task so that it was only used as a final, last-ditch approach for really nasty bugs. Today we can [autonomously] bisect build bugs via a simple shell command around "git-bisect run", without any human interaction!
That doesn't always fly for a large active project, and for regressions that aren't detected until weeks or perhaps months after they were introduced. I use exactly your technique and it works more than half the time, but for the rest it's really nice to have git bisect.
> I guess where git bisect slows way down for me is that you have to devise code that will indicate definitively that the bug exists.
Normally whoever found the problem should provide you with a test case. Once you fix the problem you'd add that to your unit tests to prevent future regressions.
Your codebase must be very simple, because you can pinpoint the problem to a single file. Imagine you have something as complex as the kernel, a non-trivial breakage, that you can not reproduce but somebody else can, somebody who has no clue about the files where the problem could possibly be. Or some user reports that kernel X works but X+3 doesn't, and there are thousands of commits in between. Good luck pointing to a single file. Those are the situations that git bisect was made for.
When you are doing maintenance programming, where you only ever understand small pieces of the code at a time (the ones relevant to the project you are currently working on), bisecting the history of your commits is a very powerful way of doing business. In one of my feature branches, I wrote added some /* ... */ style comments to a mysql script and used apostrophe's in the text of my comment. This can confuse mysql when it executes the script. I was not deploying the new scripts into my test environment after every commit (I was mostly working on the python back-end, and only adding comments to the mysql scripts because they explained something that was happening in the python code). Several commits later, I do a full deploy and I have new symptoms. It was only by backing up through my commit history that I was able to find the commit that first introduced the symptom, and it was only the fact that I write small commits that allowed me to discover the problem with the apostrophe's.
Generalizing, whenever you are making incremental changes to a complex system where you don't understand all of the effects of your changes without lots of testing, it makes sense to write small commits so that you can backtrack to see when an unintended behavior first surfaced.
Maintenance programming is one field in which this approach is crucial.
"What part of the code is broken? (perhaps 5 minutes tops?)"
I expect this is where the git bisect advocacy comes from. Bisecting down to the commit that caused the test to fail takes seconds. Then, you can start using the remaining process not unlike you mentioned to actually fix the bug.
It helps remove that initial hunt for the problem. When you narrow the problem to a specific commit it becomes quite clear what is wrong immediately, not having to filter through the myriad of changes that may have come after. That is, at least, where I have found it to be most useful.
git bisect allowed me, as a user, to track down a rendering bug in WINE which I would have no idea where to start on otherwise. There were approx 1000 commits between the two parts and the reproduction was not terribly quick (start up game, get to certain view, verify). It even handled the fact that there were two apparently unrelated commits (by filenames and commit time) which contributed to the bug, and that some in some revisions the game wouldn't even start.
This is only one example, and I rarely have a need for git bisect in the code that I generally work on (since I can mostly keep it in my head, and regressions are rare). But in large projects it's extremely useful.
I just updated program X that I'm not a developer for, and it crashes a bunch in really weird ways. I ask on the mailing list and nobody's seen the problem and it doesn't reproduce for anybody else. I can get it to crash on my machine in a few seconds of using it by doing certain things. The backtraces from the segfault are in different places each time.
I clone the git repo, and bisect from the previous version that worked. It eventually finds the commit that broke things, with which it becomes obvious what was wrong.
I wish I could have both. Rebase to stick to single clean commit messages, but still preserve the history/order of modifications within that, even without the messages.
I might work on a project for a week, write some code, do an intermediate commit, realize it's not needed, delete it. Then when I finally merge into master (rebase), that code I wrote is forever gone. Which I think is fine, but two months later I realize I could actually use it. But there's no way to get at it, is there? Or am I missing something?
That you can create arbitrary branches and keep them forever remotely or locally. If you think you'll need it branch off that point: git checkout -b branch_i_may_need then checkout the previous branch. a git log branch_i_may_need will always show the commit you needed at the top and git cherry-pick `git rev-parse branch_i_may_need` will cherry pick it into whatever branch you're on. That way you keep the commit and can still keep the remote repo clean.
You can also avoid doing the two git checkouts by just doing "git branch branch_i_may_need" and you'll stay in your current one. I do that all the time, though i had no clue about git rev-parse. That's a great time saver.
He actually mentions this exact thing, but briefly. Keep your refs around. Branch to blah-wip, make whatever commits, then rebase and cleanup to blah. Maybe blah gets merged in, but blah-wip will still refer to whatever original history.
He then gives specific advice.