Hacker News new | past | comments | ask | show | jobs | submit login
Idiot Proof Git (softwaredoug.com)
387 points by softwaredoug on Nov 9, 2022 | hide | past | favorite | 418 comments



Big fan of Git style guides in teams. We had one at Thread. It was common for engineers to come in and find we didn't do rebasing and find it weird, but we took the opinion that history should be exactly what you actually did, not some clean and idealised version of what you wish you had done. There are advantages and disadvantages to this, but having a defined approach was the most important aspect.

Also the fact that we didn't rebase and used merges everywhere was a major contributor to no one ever breaking their git repos, something that git seems notorious for elsewhere.


Obviously what works for you works for you, but I respectfully disagree with everything you said.

The "history should be exactly what you did" argument - which many people make - is really funny to me because a pull/merge-only strategy only preserves the _wrong_ history. As a tech lead, for example, I absolutely do not care one bit about the date of a commit, or when the developer started working on it, or what was the commit they started working on top of. That may be "what really happened", but it's worth nothing in the grand scheme of things. When a commit _has made it into the product_ is the only " what really happened" there is, and that is what I care about. And a linear history makes this much easier to analyze and understand, reducing cognitive load considerably.

Also, it's strange that you see merges as a contributor to keeping repos from breaking, as my experience is the opposite.

I advocate for a rebase-based strategy wherever I go as it helps developers push better code, it actually curbs hysteria-driven "merge it as fast as possible no matter how shit it is" cases, and I see how it turns the Git log into an actually useful source of information for developers and other personas. People start reading the logs!

The log should track the product's evolution, not the developers' activities.


The log should track the product's evolution, not the developers' activities.

Git is a development tool, not a product release tool. If you want to see the product evolution you could filter to just merge commits, or just merge commits in a specific format.

If you want to keep track of releases specifically, then use tags, that's what they're for. I suppose you could make a separate branch/repo where every commit = a release, but that opens you up to merge conflicts without any benefit over tags


At the end of the day everything depends on the organization. In a hectic startup where requirements change on an hourly basis and releases are made several times a day, I would absolutely insist on keeping the log linear and as clear as possible. Tags are important, of course, but they're not that useful for analyzing a repository.

When I say "the evolution of the product" I really mean "the "evolution of the code". When a small feature branch with 5 commits - four of which say "wip" and the last one says "added color support" - gets merged as is, and all relevant information is held hostage by whatever Git platform the company is using this week and not inside the repository itself, the log is not useful to me regardless of any strategy.

But in a different setting I would not necessarily insist in the same way.


When I say "the evolution of the product" I really mean "the "evolution of the code". When a small feature branch with 5 commits - four of which say "wip" and the last one says "added color support" - gets merged as is, and all relevant information is held hostage by whatever Git platform the company is using this week and not inside the repository itself, the log is not useful to me regardless of any strategy.

Yes, it can be annoying if your developers are committing nonsense, but then just tell them to not do that, or to rebase locally before pushing.

If you find yourself troubleshooting a bunch of nonsense commits, you can just do a diff to the merge commit, and it will show you all the changes. But you also have the option of figuring out exactly which commit caused the problem, and seeing it in context. If I see an error in the middle of a bunch of commits that look like "trying x with y." Then I know that this is a tricky problem, and the developer was lucky to get it to work at all. If it is in the middle of a standard looking commit, then the developer didn't struggle with this. So maybe they didn't put enough effort into it, or maybe it is a rare corner case.

When I'm troubleshooting other peoples problems, every bit of information helps. Especially when the developer who introduced the problems is no longer with the company. Squashing commits removes some of that information, without providing anything that I can't approximate by using merge commits in logging/diffs.


> Yes, it can be annoying if your developers are committing nonsense, but then just tell them to not do that, or to rebase locally before pushing.

I'm pretty sure rebasing locally is exactly what the person you're arguing with is arguing for. The original comment in this thread was saying you should never rebase, always just merge.


To be clear, what I'm advocating for is that feature branches get rebased regularly by the developer until PR-time and a clean merge into the mainline. I usually recommend squashing to one commit but do not insist.

I can definitely see how those intermediate commits can provide more information, but there's a tradeoff. More often than not, they do not provide me much value, and instead give me bloat, so I prefer to keep things simple.

Telling developers not to do something is like telling a kid not to push that red button. The average developer chooses what's easiest _right now_ and thinks it's someone else's job to fix the mess at PR time. And they're afraid, because that one time five years ago they ran a rebase without knowing what it does, and lost some code without knowing it's actually right there in the reflog, and since then they are deathly afraid of Git. I know how to use log and diff and all the others quite well, but most don't. So I'm trying to make things easier on everyone in the long term, not the short term.


"Why doesn't git bisect work?"

"well, it landed on this rebased commit that's huge. I guess it was a kind of useful, just not as useful as we'd like".


Haha, true! On the other hand, is that better or worse than running into a string of "wip" commits that had the code in a broken state.


But if the alternative is that it's ten commits and most of them don't work anyway, the bisect takes longer to give you the same lousy information.


That's not the alternative, who develops like that?

It's like the first time I saw the essay calling ORM's the "vietnam of the software industry". I remember reading it and wondering who the hell would use ORM's in that manner?

Apparently a lot of people, but if you're using rebase because you don't know how to create commits that build and are functional then I submit the issue is with you.


> That's not the alternative, who develops like that?

As an independent contractor working with many different companies, unfortunately, virtually everybody.


I've been recommending making a git tag before rebasing. eg:

  git tag -f pre-rebase


> If I see an error in the middle of a bunch of commits that look like "trying x with y." Then I know that this is a tricky problem, and the developer was lucky to get it to work at all.

Just an aside, but if there problem being solved is harder than it looks then this deserves a comment explaining why in either the commit message or in the code itself.


    git add .; and git commit --amend --no-edit; git push origin --force
Rarely, I have to do pipeline work on repositories. You'd normally see twenty "Fix Jankins Issue" commits on the main branch because of some nonsense that only happens when you deploy UAT or whatever. Once I learned this little gem, this is also how I manage my feature branches mostly. But also my employer's fleet of laptops has been aging and I've had to do 3 swap outs this year, so I like to keep my in progress work pushed up just in case.


git add .

We have had multiple security incidents because some developer left a credential file inside the local git clone (no, not all tooling supports out-of-tree stored credentials). Blind 'git add .' is the first thing I teach my developers not to do.


git add . is very useful though. Surely, the first thing to teach here is to always git status before committing?

My typical workflow is to git add . to see the mess I’ve made then decide how to clean it up. If I’ve mistakenly added a credentials file, the fix is to add it to the gitignore and unstage it, not JUST unstage it.

Not saying that you shouldn’t do both, but maintaining a gitignore and completely removing the potential problem for other people seems better than pretending your tool is more limited than it is.


If you can tolerate a GUI, Git Cola might be a solution. I'm using it exclusively for some 5 years now – it's lightweight enough, but still makes you think about what you're about to commit. You can add things to .gitignore directly from there, too.

https://git-cola.github.io/

Default layout is a bit weird IMO, here's what I'm doing instead: https://u.ale.sh/my-git-cola-screenshot.png


You don't use .gitignore? All our projects use dotenv for local credentials with a .gitignore that covers build, log, and any *.env file.

Sounds like you need better tooling, tbh.


I did this for a while but have moved on to

    git commit --fixup HEAD
and you can tack an -e on the end to add more notes in the commit message body. You can fixup prior commits by supplying their short hash, which is how I originally discovered this: I essentially wanted to amend a commit farther back in my branch.

This makes a commit with a “fixup!” prefix that works with

    git rebase --interactive --autosquash
you can also form that kind of commit directly, and there is also a “squash!” directive. Now you don’t have to force push amended commits. And it helps sometimes when you accidentally amend something you didn’t mean to–now you can just soft reset to HEAD~1 and try again.

I don’t even bother locally rebasing to autosquash it all anymore since we use squash-to-merge/rebase in github PRs now.


I did something very similar with a bash function:

  function git-commit-fixup() {
    git commit --fixup ":/$*"
  }
  alias gcf="git-commit-fixup"
  # Looks for the most recent commit that matches its arg
  # eg: we have three commits with messages: 
  #   "fix: the thing" 
  #   "feat: 5 percent cooler"
  #   "test: test coolness"
  # then we do some work and git add, then do:
  gcf cooler
  # now we have 4 commits: "
  #   "fix: the thing" 
  #   "feat: 5 percent cooler"
  #   "test: test coolness"
  #   "fixup! feat: 5 percent cooler"
  # And autosquash will combine the fixup commit with the appropriate semantic commit as you say.
Sadly I haven't been using it much as someone introduced a bunch of commit lint git hooks that choke horribly on the "fixup!" part. And you can't pass --no-verify while rebasing.


I'd probably suggest --force-with-lease just to be sure ;)


Me too. I made an alias that was _shorter_ than --force to make it easier to type (and hence more likely for me to use by default).


You can combine

  git add .; git commit --amend --no-edit;
with the `-a` option

  git -a --amend --no-edit;


> feature branch with 5 commits - four of which say "wip"

Merge+squash eliminates this and it works every time. One or two clicks on gitlab for example.


Actually you're both wrong, its a change tool, not a development or a product release tool.

"then use tags, that's what they're for"

No they're not - that's definitely a useful way of using them, but they are just labels.

Why is it useful to know this? Well, when you know your tool better (how it operates, not the porcelain or CLI), you have better insights and are able to use it better.

You can manage Agile-style "features" with nothing but hashes and tags, no branches necessary. Branches are actually somewhat antithetical to distributed development, they're a useful concept, but that's all they are.

I do tend to agree though, re-base is superior to merge, in a product setting. If you want to track a feature set, having the set of commits which represents that feature set is better than a litany of nonsense tangled up in twelve feature "roots" (branches which have been merged together).

In a "do what I want" setting, rebase and merge are about equal, though. If I want to work on 3 features independently, I'd like to be able to easily see both features in parallel. I also would like to squash my features to single commits, and rebase them into feature branches where I can then merge/rebase/whatever those into my final "product" branch.


> Actually you're both wrong, its a change tool, not a development or a product release tool.

It's funny that HN can't agree on what Git is.


If you ask me: a low-level SCM framework.


> And a linear history makes this much easier to analyze and understand, reducing cognitive load considerably.

This, so much this! And the price you pay for it is a slightly more difficult "insert". We started to enforce linear history in one of our bigger repositories (about 100 devs) about two years back; the first months were quite the ride (I had to do plenty of support-sessions to recover 'lost' changes). But the devs really started to see the benefits, and once they got the hang of it which was actually faster than I anticipated for most, it was smooth sailing. Many actually started to embrace it and advocate it for other repositories as well.

For me, it became also evident that filtering for people capable of learning git (rebase, cherry-pick, reset etc.) was very good at finding out who I'd want to work with and who not. It's really not that big of a deal, the UX of the CLI might be lackluster but the underlying datamodel is rather straight-forward. It's such a quintessential tool in our every-day-workflow that it's really worth putting a bit of time into understanding it, and if someone can't or doesn't want to, well, it might just be better if they work somewhere else than I do.


You can, 99.9% of the time, emulate linear history with first parent history, which is a post-hoc tooling choice that doesn't remove context.

Developers shouldn't try to merge branches with wip/wip/wip/wip histories either, that's just garbage. Commit messages are documentation, fix your documentation before you publish.


> You can, 99.9% of the time, emulate linear history with first parent history

Fully agreed, but that requires first of all to understand how this works and second requires you to run commands locally. If you're unfortunate enough to have to use e.g. bitbucket-server at work like I do, you'll always see the full graph, there which is A LOT easier to grok if it's linear. And since that's what's most devs look at (instead of git-log using some extra options) and also where CI-state happens to be reported (green/red build), that's worth a ton :)


Usually my strategy, but it breaks down if you have someone who is bad enough at merge conflict resolution. We had one guy Steve who was upset that he was not as in charge as he wants to be, but he was doing a few things that break my trust so we are keeping him on a shorter leash than he likes. His code and ideas are okay but not great. He’s picking on this guy Mark, who sat next to me, to make himself look more valuable. Mark was not the brightest senior dev I ever worked with but he had his uses, and I hate bullies. Last but not least he was that he was terrible at merges and his solution to this problem was to delay as long as possible. We are a full CI environment and he’s making work for others by doing this. Namely me. None of these are conducive to me giving out a lot of responsibility, so he has some but not what he was after.

One day he’s blaming Mark for a regression in the code. Being pretty loud about it in fact. When I look at the bug, it sure looks like the sort of mistake Mark would make, and the annotation says Mark. Only the thing is that I’m the one who reviewed this code and I know Mark so I was looking for exactly this sort of bug and was pleasantly surprised to find that he was learning and had dodged that pitfall. So I go excavating the history and sure enough, that bug wasn’t in the code I reviewed. It was in Steve’s merge resolution. Fuckin’ Steves, man. And the fact that git lets you do things like that is not a great feature either.


I don't know either Steve, Mark or you, of course - but this sounds like a broken dynamic where boss's preference of one person (sitting next to Mark, probably mentoring him) pushes another, Steve, to show his fighting side.

Yes, Steve should have kept things honest. But being the boss, you need to be careful not to pick favorites and to treat everybody in a similar fashion. People are very touchy about how they are treated within the "tribe".

Also, if you identified the issues both of them are bad at, are you all solving them? Being "terrible at merges"... how does that even work? Isn't that kind of an important skill? As their boss, their know-how is your responsability too, are you solving it?

Sorry if I have misjudged the situation, I obviously don't know it first-hand, so you will need to see for yourself if the above is true. There were just too many red flags (for me) in your comment to let it pass... And the reason I see them is that I have misjudged colleagues in the past, and wish I had known better then. Ah well.


The one who does the merge is always the one responsible... even if your code is amazing if it doesn't work with the (working) body of code that's on you, not the people who came before.

Of course Steve can just commit terrible code to main-line, and Mark (or yourself) are always stuck fixing their code, but that's what review/testing are supposed to be for - maybe blame the reviewers and testers in that case instead.


I advocate for rebasing, but I discourage using a linear history. The two may seem to contradict one another but they are distinct (enough) that I felt it worth mentioning. When a developer pushes, it makes sense for them to rebase first because they are shipping those commits at the time of the push.

But, depending on your git workflow, when merging to main, I prefer a merge commit so I can see the tree of activities that lead to any particular release.


That's fine of course. Personally, I prefer to make things as easy as possible to understand at that unspecified but probable future date when a customer opens a SEV1 and I have to consult with the log, among other things. Make it idiot proof later, when time is _really_ of the essence, rather than now, when you're being artificially pressured to deliver that story for the sprint review in two hours.


I'm advocating that merge commits in main make it easier precisely for the requirements you specify: To track what feature(s) was introduced at a given release. With merge commits you not only have groupings of commits for features developed, you have that ability to revert a whole feature with just one revert. If you rebase onto main you are flattening those groupings and the entire commit stack into one serial history. For super quick "fix forward" products, that's fine and I would be happy with that. In products that are not so quick or perhaps you have tighter controls/SLAs/etc. Being able to immediately identify and revert an entire feature from main is very valuable, above and beyond a feature toggle imho.


Squash commit will squash everything done into a single commit, so reverting it is easier.

Also when you have to cherry-pick fixes into an older release branch, you get to fully appreciate squashed merges. If you did not squash, you need to cherry-pick all commits from the merge. If the feature branch was not rebased and the dev merged main into the feature branch multiple times, then the branch commits are inter-mingled with main commits and it is so easy to mess up the cherry-picking. Just imagine if the dev had to fix conflicts...

All that makes squash commit a time-saver.


> Squash commit will squash everything done into a single commit, so reverting it is easier.

No, it's just as easy, while needlessly throwing away other useful information.

> If you did not squash, you need to cherry-pick all commits from the merge.

...or use a single `git rebase` invocation.

> If the feature branch was not rebased

Linear workflows with merge commits usually require feature branches to be rebased on merge (just like squashes, just without the actual squashing), so that's not a problem at all.


so reverting it is easier.

Only reverting the entire feature is easier. Reverting a single change (for example, because it introduced a bug but is not critical to the feature itself) becomes much harder after squash.


Also, keep on mind that Git is the engine with which Continuous Integration is made. CI is developers integrating with the work of one another on a regular basis. If the product changed since you started working on your new brach, then your branch is stale and you need to integrate with the recent changes. Why wait until you're done to find out that your branch can't be merged anymore and you have to make a ton of changes when you can keep on top of things at regular intervals and make the final merge as easy as possible, not just for you, but for the code reviewers, the QA guys, the DevOps guys, everyone.


I disagree with both of you :). Personally I prefer to squash to one commit per ticket but on a team level I don't care about a consistent way.

I've found that the history rarely doesn't matter at all to me. Finding out who modified a specific code section (git blame) is usually good enough.


In the hopes someone will see this: Why isnt this the standard? I've never been in the position of coordinating multiple engineers, but when I look at my colleagues code I never ever once cared about their individual commits. What am I missing?


My point wasn't that this strategy was the right one, but that having a clear strategy is more important.

I personally prefer a more rebase-heavy approach, but what we had worked very well for us.


Oh definitely, a clear and enforced strategy and conventions are more important than anything.


Boundaries are equally important - a set of expectations in context is fine, letting people know there is a world outside your little ego-bubble is also important, them knowing how and when to live in both is valuable.


My current thought on this is that the git model (or at least the interface for it) is probably a touch too simple to accommodate all the things people want to use it for. As a result, you get this whole 'clean history' vs 'what really happened' split. And often you can find a few more splits if you dig in a bit deeper into the actual mechanics people prefer.

Generally, bigger picture stuff works best with cleaner histories as they mop up a bunch of unnecessary and distracting details, and neatly package things together. But doing so also means you're getting rid of, well, the details. If you need them later - and some poor bastard always will - you're just screwed.

Unfortunately all we've got are commits, so you're constantly fighting different groups and even different people who value the benefits of different approaches due to their positions, histories, or preferences.

This isn't even a half-baked idea at this point, but at first glance something like a meta-commit which just contains more commits and a message seems like it might be better. The top-level commits could just be the 'clean history' while deeper levels could record more of the as-happened details.


Most of what you said there isn't actually true. I don't doubt you believe what you're saying, I'm just pointing out it's not true.

My favorite is how apparently rebasing causes developers to write better code. If you say so.


Thanks for pointing out my errors, I now see where I was wrong.


Telling people not to rebase and having code that was never rebased are two very different things. Have you asked former employees if they rebased when nobody was looking? If you haven’t then you have no reliable data on what happens.

When people set ridiculous absolute rules, what develops is an underground of people who don’t follow the rules and in some cases get a thrill from subverting the dystopia.


When I hear 'no-rebasing' I hear 'no-rebasing after you've done a PR'. That's when my code and commits are ready for others. My ongoing commits are often more notes to myself as I find those more useful during the work but make no sense as pushed commit messages where you need to block things off logically. Organizing your commits is not much different than organizing your code into modules. Something that should be done purposefully and with forethought.


To clarify, this wasn't an enforced hard-rule, but instead just a written down style guide. Style guides can always be broken for good reason, but it's the sort of thing that a code reviewer might comment on or discourage. Generally rebasing before opening a PR was considered to be fine.


> history should be exactly what you actually did, not some clean and idealised version of what you wish you had done.

This is a false premise. There is a product (and its various versions), and the team that develops it. Both have a claim to a meaningful definition of "history". What you are arguing is that the 'history of developer' is more fundamental than a less noisy 'history of the product development'.

Does it really matter (and need we record it for posterity) if developer x used n commits to post n incremental changes to a well defined software unit of the product?

It seems a comprise position of (1) no history rewrites before code review, followed by (2) post code review cleansing squash before merge would satisfy all concerns and history records have also served their purpose.

. A developer's timeline is relevant to her team lead, not the product manager.


If the employer I worked for started micro-managing the way I use my tools (that affects nobody else) I would consider leaving, honestly. If I rebase on a branch that hadn't been shared with someone else, why does it matter what my boss or team thinks about that approach?

Code styles are one thing, what I type into my terminal is another.


When you commit you're sharing with your team and future team. I think it's fair to have a set of agreed guidelines around that. What if you wanted to put all your commit messages as "cnity did it"?


I agree with the post you're responding to, and make a lot more commits than I share with my team. Pushing commits to a shared branch is where work is shared.

At my current employer, we used to use Perforce, and in that world you're totally right that committing (submitting in Perforce terminology, IIRC) did share changes. In that context, we developed a lot of bad patterns, losing code or holding up other people's work while a developer got their work ready to share. Transitioning to git has been super painful, due mainly to people treating git as if it's the same sort of thing as Perforce...


It's simple: I wouldn't do that. Do we need guidelines to not delete repositories too?


Yes of course


Technically your git history isn't a record of what you did, only what you committed. If I make a bunch of changes locally, commit then realise I need to make a few extra changes before pushing there's really no harm amending the commit. But once pushed, I agree, it should stay as is.


What about squashing commits in a merge? I don't do that all the time, but it is useful for certain things. Like repeated code changes to test something in the CI/CD pipeline (that I can't replicate well locally), where only the last change that got it working is of any interest.


Squashing is rewriting history.

It sounds like the grandcomment had a ban against rewriting history across-the-board, which would help make git idiot proof. I love rewriting history, not because it's what I wished I had done but because it's what I am going to want to review when I have to.

Rewriting history is a great way for gitiots to shoot themselves in the foot.


Couldn't you rewrite history locally on your own branch and nobody would know?


Hard agree. Do whatever the heck you want locally, just try not to screw everybody else up when you push.

(From the guy who force-pushed on a personal project yesterday to resolve a situation with multiple remote heads - I am ashamed)


Yeah... You should never rewrite history of shared branches. You are just asking for trouble in that case.


We informally allowed whatever you wanted until you opened a pull request.


Sure, unless you bork your local repo enough you need help from a teammate getting your work into a PR.

Not to say that doesn't make for a good learning moment.


At some point, every junior is going to mangle something, and need a senior to sit them down and give them the git reflog talk. It's an inevitability, and should be embraced as a natural part of the evolution of a developer.


Haha. what's the git reflog talk? To be clear I know what reflog is, but what's "the talk"?


Probably just making them aware of it. But I can give a slightly longer spiel:

Commits in git are immutable. They're identified by their hash, so they have to be. What's more they have the hash of the previous commits so the whole chain back to the first commit can't be changed. You can only add new chains.

As a consequence, if your main branch points to commit abc123 and your feature branch points to commit def456 then it doesn't matter if you merge, cherry pick, rebase or dance the fandango, if you point those branches back to those commits, the branches must by necessity look identical to the way they looked before you did anything.

And you can find out where they used to point in the reflog.


I'm certainly aware of reflog and have (thankfully) only had occasion to use it once or twice that I can remember.

To my original comment - having to force-push in order to resolve heads - is there a "correct" way to do this that doesn't feel gross?


You're going to have to explain your problem in more detail than "resolve a situation with multiple remote heads".


To explain, I'll borrow the solution that I first saw in a Fog Creek presentation from ages ago ("DVCS University")...

Essentially, if you have more than one person making changes to the same piece of code, the method for resolving them is:

1) Pull - gets their changes

2) Merge - puts their changes together with yours and you can reorganize them at that point.

Note, this was done with Mercurial which doesn't have the concept of a stage, so Pull feels like it has a slightly different meaning when you look at it that way. One [suggestion][1] for Git if you want to achieve the same effect - getting a Fast-Forward at the end - is to Rebase, then Merge last.

Part of me knew this, I simply forgot and wanted to get out of this particular hole.

[suggestion]: https://www.atlassian.com/git/tutorials/rewriting-history/gi...


That's the start of a setup, but that's not enough information for me to figure out why you needed a force push.

I'm not going to demand you explain more, but if you want to explain more then I'll try to answer the question you had about whether there's a better method.

Also Pull is generally a shortcut for "Fetch then Merge", and just getting changes is Fetch.


Just the explanation of "no matter how bad you've fucked up your repo, as long as you haven't run 'git gc' or waited a few years, all you need to fix it is to find the commit hash from git reflog, git checkout that hash, and then git branch to give you an easily accessible reference to the commit". Followed by some pointers on how to efficiently dig through reflog.

It's pretty freeing to realize that it's basically impossible to lose committed code.


Or made a "fresh clone". I've had colleagues nuke a repo when it gets in to a "bad state", start over with `git clone`, repeat the work that git lost...


What about squashing commits in a merge? I don't do that all the time, but it is useful for certain things. Like repeated code changes to test something in the CI/CD pipeline (that I can't replicate well locally), where only the last change that got it working is of any interest.

You can get the same effect by reverting-to/checking-out a merge commit, or doing diffs between merge commits. You can also get a fairly clean history by only showing merge commits.

My rule of thumb is that any commit that has been pushed to a shared remote should never be re-written[]. If you're going to rebase, do it on your computer before pushing, or on your own repo before opening a merge/pull request.

[] exceptions would be removing accidentally committed secrets or large files that are no longer needed.


For me, there should be a balance between the detail of history and the usefulness of the information.

For the changes I do, not only I rebase them all in a single commit, I don't even merge branches. I cherry-pick my changeset. Clean commits, clean history.

Whatever I do before a push is my business and no one's else.

For anyone else changes, that's it, anyone who is not me: history is untouchable. No rebases, no squash, whatever it is already pushed, must stay as it is.

When I do a pull: git stash, git pull --rebase, git stash pop

I encourage everyone to clean their commits before a push. A clean history is a good history.


gosh, wow, I want the opposite of that for any project I work on. In the last five years I have never used a merge commit and I love it. I'd honestly prefer a version of git that doesn't have em.


Without merge commits you can never have more than 1 long-lived branch because said long-lived branches will not have any common ancestors which makes pulling changes between them a nightmare (literally every file will be in conflict)...

I like linear histories too, but if you have a production branch and a development branch you need merge commits between them.


I don't think either of those is necessarily true. I don't ever need to merge long lived branches if I don't want to. And I definitely don't ever merge my production branches with... anything. They are always just a fast-forward from another branch.


In my case I have 2 branches - production and development which map to different cloud environments. Pushing to development branch will run CI/CD to auto-deploy to development cloud, and pushing to production will run CI/CD to auto-deploy to production cloud. This particular software is also classified as a medical device under FDA which means there are regulations about what can be deployed to production and when. Hence, I cannot just fast forward production from another branch without first getting FDA approval. I can, however, cherry pick certain kinds of bug fixes into production without FDA approval, and then later (after FDA approval) merge the other approved changes into production as well. In this case fast forwarding isn't possible because of the cherry picks so we are using just normal merge commits.

Do you have a better git workflow in mind than what is described above?


long lived branches are an anti pattern so this sounds like a plus to me


I’d rather wait until a project is finished, than to push incomplete changes that (I hope) aren’t reachable.

But a long-lived branch needs to be rebased before review, because conflict resolutions need to be reviewed. In fact reviewing the version without the conflict resolutions is a waste of time, because that version probably won’t ever be deployed.


> that history should be exactly what you actually did

That gives you the history of exactly what git commits you made, not exactly what you did and how you went about solving the problem though, right? git describes changes to the source.

It seems like you're getting the worst of both worlds, trying to understand how the source changed is much harder when you have peoples' experiments and draft changes and false starts littering the history. And understanding the approach to problem solving and why decisions were made is pretty unsatisfying by digging through that stuff too really. That seems better kept as documentation and/or in the commit logs.


Modern Git workflow is very simple on its own:

1. Your codebase has a `main` branch which is write-protected

2. Devs submit changes to `main` from their own branches using PRs

3. Devs can do whatever the fuck they want on their own branch

4. PRs are merged one at a time

5. When merge happens a dev's PR is squashed into one commit that gets appended to `main`

6. If next dev wants to merge their PR with a conflicting change they have to resolve the conflict first and then they merge

7. The end result is that `main` is a linear history of all PRs with the time they were committed to the `main` branch i.e., when they could've started to break prod.

This is a solved problem and it works great in industry. Why break it? If you want interactive rebase then make 2 PRs.


It's flat out amazing how often people want to complicate git.

My current company, everyone is in love with git flow. I pointed out the person who originally created it literally wrote a blog post explaining why it was designed for very specific needs that most projects don't have, and yet ...

I constantly see people doing presentations on how this is all supposed to work. And why? You can get away with very simple flows.


People love bikeshedding.

Git is also somewhat symbolic: you're a "real" programmer if you use it and not if you don't. I guess an advanced use of git makes you "realer". I think there's an element of "tidiness" to it as well.

It's definitely cultural more than technical. More tabs vs spaces rather than static vs. dynamic typing.


I agree with most of this, except:

> 5. When merge happens a dev's PR is squashed into one commit that gets appended to `main`

I don't know if GitHub supports this, but GitLab has a semi-linear history feature. When enabled, it won't let you merge unless a fast-forward merge is possible, but it never does a fast-forward merge.

This give you (IMHO) the best of both worlds: your history is pretty linear and easy to reason about. The "first parent" of each commit on the main branch is a linear chain of the main branch history. However, for more complex changes you don't have to squash. Those intermediate commits will be on the second parent of merges, so they're easy to filter out:

    M─┐ [main] merge
    │ o commit
    M─┤ merge
    │ o commit
    M─┤ merge
    │ o commit
    │ o commit
    M─┤ merge
    │ o commit
> If you want interactive rebase then make 2 PRs.

What do you mean?


What do you mean?

Dev 1: "I would like my PR to make 2 commits into `main` instead of the 1 that everyone else gets via squash."

Dev 2: "Ok then make 2 PRs."


It's reasonable to want a series of commits (or patches) to be pulled together, assuming they are well-organized commits.


I prefer to not squash, because it loses useful history, but craft the commits in the PR instead.

A small PR might be 1 commit, but splitting the commits for a bigger PR makes it much easier to write good messages for each change. This is useful in future when refactoring and interacts better with e.g. git blame, rather than having 1 big commit that changed files in numerous places.

This is also useful if a PR is broken. It’s easier to revert a simpler commit (if possible) and just fix that, than redo the entire PR with the fix.


>if a PR is broken, revert a simpler commit and just fix that

I guess it differs between "library development" and "service development".

When you're developing a service, what's in `main` is constantly being tested under the production/customer load. So reverting the known bad PR is a faster fix. And the bug is affecting your customers, so you fix fast and cleanup later.

Typically in industry if a PR is broken you revert the whole thing as fast as possible. But most projects in industry are services.

However when you're developing a library you'll probably release via tags. So if bad code makes it into `main`, it's not a huge deal, and it might be clearer to just revert the one (child) commit of the PR. Because there's no time crunch, no bug in prod, nothing affecting customers.

>but craft the commits in the PR instead.

The problem with not enforcing squashing is trust. Think about the worst dev on your team:

Given that trust, are they gonna split their PR into useful commits? Fuck no. they're gonna merge into `main` a bunch of commits saying "fix typo lol xd" and "lol I suck".

If you give them the power to not just make 1 commit per PR, but N commits per PR, are they going to fill up your team's history with crap? Absolutely they will.

Squashing is a useful gatekeep in that way, to prevent one person from screwing over everybody else and making it harder to `blame` to rootcause a critical outage.


I work on services too. Typically you don’t want to revert the entire PR, just the bit that turns it ‘on’. If you make this bit as small as possible, it’s often simpler and easier to inspect the correctness of reverting a flag flip or a config option change than it is to revert actual code/config. Reverting a ton of code is what you do when you don’t need it anymore. It seems like a bad idea to do it in a hurry (unless you have no choice, but that’s a separate point about designing features that have this property and making it a strict requirement during code review where possible).

Then again, reverting commits in your source tree seems like a poor fix to the problem in any form. What you really want is rolling back to the last known working version (which you hopefully can do without needing source tree changes), rather than revert one of n changes, that may depend on each other and rolling forward.

Especially if you are building a new feature that is riskier or bigger, you will put the main bit of code behind a flag/config option (or perhaps even multiple to incrementally test larger and larger bits).


Sure. I'm just talking about binary deploy. Of course you're gonna use flags to flight things in an industry context, depending on the company and its CI/CD practices.

Let's say the binary deploy (before flighting) is fucked somehow. Then what do you do?


Well, if rollbacks (which shouldn’t require affecting the source tree) are a thing then that. Otherwise you’re right and code needs to be reverted and hope it works.

But in the 2nd case, I’d make sure to increase the priority of having known good rollback versions available (and rollbacks performable) and also carefully consider what CI/CD could be added to catch more broken binaries (e.g canary or staging if it’s important enough) and what code review practices could have prevented it.


Ok I agree, you roll back to the known working version. The easiest way to do that is revert the whole PR (or data deploy in case of flags, ofc). My point is not "flags vs. no flags". My point is "each PR should generate one commit because that's easy to revert".

The commit dag of git is a cool feature but shouldn't be in `main`. It's so much easier to work with a linear history and one where each commit contains all the required context to figure out "could this have broken something".


I don’t think reverting the broken code should be on the critical mitigation path at all.

Imagine a scenario where a bug didn’t immediately cause an issue or where your release contains more than 1 new PR. If you suspect the latest version of the binary is broken, your first instinct should be to use a version that isn’t. Figuring out the change and rolling it back should come after the rollback, when you have more time to think.

Deciding whether to revert a change is tactical question. Often the issue will be because you tickled an unknown bug in a different part of the code. In that case, it’s a lot easier to fix forward than revert the code that tickled the big and go through the multiple steps of fixing the bug and redoing.


Linear history is good, and having multiple commits in a PR doesn’t prevent it. The only change is adding n (ideally well crafted) consecutive commits rather than 1.


The problem is who is doing the crafting. (And the approving.) At least squash based approach limits to 1 the number of commits an untrustworthy "crafter" can occupy.


I think most code review already requires good faith on behalf of the reviewer already.

I do see what you mean about untrustworthy crafter. If we want to preserve master history, then the damage of a bad commit chain is worse than of bad code (which can be fixed/undone).

However, I think that the truly adversarial case is rare (and an exception could be made and master history be rewritten in that case). In most cases though, hopefully your coworkers and not deliberately trying to sabotage the codebase :). And I don’t think the commit chain needs to be a work of art or anything, just mainly avoiding typo commits and similar, so it shouldn’t be difficult to do when approached in good faith.


> the truly adversarial case is rare

The problem is not adversarial, or due to malice. The problem is ignorance and expediency driven by a desire to push code and little incentive to cleanup your git history.

The easy fix is to squash PRs.

The hard fix is to enforce that devs become "crafters" and to define what is and isn't "good faith".


Ratorx, do we work on the same team at google? I feel like I've heard you before?


Possibly? I’m an SRE. I don’t think my position is too different from the SREs I know :)


Replying to the second bit of your comment:

Commits before they are in master are fungible. If you have a PR with bad commits, you should treat it the same as bad code and force a refactor of the history. That’s enforceable during code review. I’d even do the same if their commits were too big and could reasonably be broken down into smaller changes.


Mainly agree except the squashing bit. Squashing means lost history, and you cant tell why a specific method or peace of code was written.


I'd argue "why a method exists" should be addressed with naming and javadocs, not in the commit message. Why split the meaning of the code between the code itself and the commit messages?

And if it's not possible to document inline, the PR docs or code review comments should address this. Then future onlookers can use `blame` to see the context.


Commit messages should describe the change, not the code. This explains why the code was changed from whatever it was before, but not what the code does.

This can be important information e.g. when troubleshooting bugs, since it could explain the developer's thinking. Like in Chesterton's Fence; why on earth would you do something like this?


How would you address bug fixes and all other type of changes?


When picking a git workflow, you should start with your system's and your team's requirements in mind. Don't just copy git-flow or GitHub's simplified version.

Not every team or system uses pull/merge requests at all. Some do pull requests to the development branch, and to master when releasing a new version.

In some teams, code review is the bottleneck, and a developer may have to create a new branch from a PR to keep developing while waiting for code review, then create a second PR that builds on top of the first.

This last workflow is the main reason why I'm opposed to squashing PRs. It just doesn't work.

Also, squashing my carefully composed commits with individual references to DevOps backlog items is insulting.


>merging a sequence of PRs from same branch is bad if you squash

For me in this scenario, merging `main` w/ squash commit back into PR2 branch after PR1 gets merged works here.

As an aside — if code review is the bottleneck, it's probably not a huge investment to do some git fenangling to get the above to merge cleanly.

But I don't see how this use case is a dealbreaker or even specific to squash...? Won't you have to incorporate/merge other devs' changes to `main` back into PR2 anyhow, regardless of whether PR1 was a squash, rebase, or full history?


Neat and Simple!


IMHO the single most important idiot proof of git should be a universal "undo" command.

- Committed wrong? Undo

- Switched to wrong branch? Undo

- Pushed wrong? Undo

- Merged wrong? Undo

- Wrong reset? Undo

There should be a "idempotent" undo for every action in git. If not, warn the user for possible outcomes.

In this way, we can safely learn git via trial & errors.


git is already an undo mechanism itself. Commits are immutable. `git reflog` provides an undo log for pretty much every operation on the repository. `git push` shows the previous state of the branch which you can use to easily undo what you just did on the remote.

The only part that isn't easily undo-able is the index and working dir, but that's kinda the point of having an index in the first place. If you overwrite or delete a file without staging it in git, or unstage and then change it, git isn't going to magically restore it with an undo command. And that's fine.


What do you mean by idempotent? Clicking undo two times and going back only one step sounds confusing.


Less confusing than undo/redo on Emacs. Wanna redo? Just undo your undo after breaking your undo sequence. You broke your undo sequence and want to resume undoing but not redo? Just `M-x undo-only`. What if then only the reverse and just want to redo? Just `Mx undo-redo`. Also keep in mind that if you have some region selected undo only applies to the region rather the file. Unless you use C-/ which always operates on the buffer.

It's without doubt powerful. But hell it's confusing.


> emacs

I found undo-tree incredibly intuitive.


> What do you mean by idempotent?

For every git command, provide a way to undo the previous command, if possible.

For example, `git checkout -b abc`, a `git undo` would execute `git branch -D abc`.

This is just an example, I know you can find many problems and with this approach and edge cases when it doesn't work, but we can make life easier for beginners in most cases.

> Clicking undo two times and going back only one step

Huh? I never meant that. Click undo two times would go back two steps, naturally.

But of course, there are some actions that can't undo, so give user a warning or something.


https://www.merriam-webster.com/dictionary/idempotent

idempotent means that doing it twice gives the same result as doing it once. So it seems you meant a different word?


Yes I mean undo would always restore the git to a previous state no matter what kind of commands you typed (best effort). It's perfectly symmetrical forward and backward operation which will guarantee restore to the same repo result whether you misstep git once or multiple times.

Is there a better word to describe this? I am an ESL speaker.


I suppose they meant that (most) git commands should be idempotent, and therefore easily reversible.


Being idempotent does not help much with invertibility, e.g. zeroing out a file is idempotent.

In any case, the above post makes sense if you just delete the word.


> In this way, we can safely learn git via trial & errors.

Thinking about this and I realize it could be beneficial in a lot of software. No one RTFMs anymore, and giving them the ability to trial and error makes sense. I know I appreciate it (pushing buttons to see what happens), but I’m not sure how common this approach is in general.


Being able to undo an action is is actually a general usability principle that has been around at least since the 1980s :-).


Fair enough, I should have been clearer. I’m thinking about extending it to CRUD apps for example. I haven’t personally seen any do that, and it might get really involved due to dependencies, integrations, etc.


The Humane Interface, a book on UIs by J. Raskin, talks about the importance of undo/redo.


While it may not be accurate, but for "idempotent", did you mean "invertible"?


Besides a wrong push, the reflog gives you all that albeit not very intuitive. I admit that I often put tags at the current stage before trying anything advanced in order to find the last known good state easier.

Undoing a wrong push involves so many corner cases that it would be hard to implement. Where I work we are using gerrit and the default project setup doesn't allow direct pushes. Than it's only abandoning a change.

When I started working with git I came from Perforce and CVS. I had the complete wrong mental model in my head because of this which got me trouble understanding why things where not working as expected especially when pushing to remote.


That's not a bad idea. Pushed wrong is pretty hard to recover from (since you often aren't allowed to rewrite history on the remote), so I don't think there could be an easy 'undo' action for that, but the other ones could potentially be done.


If the result of the undo is that the remote branch (say main) is back at the point where it was just before, then that is actually a kind of change one could allow? As it does not rewrite history, just reset to previous point in time.

But to really support this well, I think git would need a git commit object which means "reset to previous state".


The issue is if somebody else has pulled the remote branch and then you remove the commit, suddenly their branch and yours doesn't match. And then they can't push without massaging it.

I hate revert commits but often you have to do them if you make a mistake because usually force pushes will be disabled on master for that reason.


> since you often aren't allowed to rewrite history on the remote

For dev branches it's easy, `git reset --hard` on local then `git push -f` again. This command combination is not that intuitive for beginners.

I agree this action is sometimes hard to recover e.g. on protected branch, so a warning must be given to the user.


Assuming your remote allows you to force push. Also you need to specify where to reset to because 'git reset --hard' will just reset you to HEAD, so you actually need to do 'git reset --hard HEAD~' (usually I don't use --hard for this either because often there's some work I want to keep from the commit).


You can also `revert` which, while staying in history, needs no warning.


git undo -f


Since we're talking about beginners (dare I say, newbies), I think adding this flag would just make them use it all the time. Perhaps just for this command we should only allow --force-remote, so that they have to (1) type it out explicitly and (2) think about what exactly they're doing.


Should also support the gnu long-option, git undo --fuck-up



Map that bad boy to ctrl + z in terminal and now your talking business!


Call git every time you background a process? How would that make sense?


I wrote an open source project that may be useful to people here:

https://github.com/dmuth/git-rebase-i-playground

It lets you create a Git repo with synthetic commits and has sample exercises for doing different things within that repo, such as removing commits or squashing commits. (along with hints and answers)

Building this project helped me understand the ins and outs of Git much better and I suspect there will be value for anyone else who works through the examples.


This seems like something that should be used in hiring filters, maybe right after fizzbuzz.


Aliases don't make git easier to understand, they make one specific git command require typing fewer characters to run.

I am not against making aliases, but just saying, if you don't have an understanding of the commands they run, you'll still one day be in a state you don't know how to fix, and THAT is when people "lose their code" etc.

You use git every day, it's worth learning and understanding. You don't need to be a pro, but you need to be good enough.


The only reason people lose code is because

1. they didn't read this one diagram: https://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-...

2. And then they run `reset --hard`


I'm mostly with you, except, `git reset --hard` is relatively easy to recover from. `git reflog` generally can fix most `git reset --hard` commands.

On the other hand. `git checkout .` with unstaged changes tends to be the most common way I've seen people lose code via git.


That's fair. Both overwrite (edit: seemingly overwrite) (un)staged changes. But `checkout` requires less hyphens.

I feel like all of this would be resolved if my boy Linus had renamed `git add` to `git stage` or `git prepare` or something more clear.

Or added some warning, like "all this stuff you haven't committed yet; it's not managed by `git` so don't try to overwrite it all using `git` until you've committed some of it."


This is another argument why "not squashing is bad"; because it discourages people from committing.


The problem with git is hardly anyone reads the fucking manual. Git is not hard. The UI is inconsistent, but documented. When you just foist commands onto people, you can't be surprised when they fall off the happy path and don't have the mental model to understand how to fix it. There's no such thing as "idiot-proofing" for people who don't RTFM.

--force-with-lease as a default is a really bad idea. Copying random aliases is a bad idea.


If one user uses a tool wrong, the user may be at fault. If many users are using a tool wrong, the tool probably doesn't have great UX.

Saying "the tool is great, users just need to RTFM" sounds an awful lot like "you're holding it wrong".


UX can affect incidental complexity, but essential complexity cannot be removed from the problem; this is the same as what some of the sibling comments are getting at.

When I interrogate users who aren't grokking git, it's not really git's UX that's the problem, it's far more fundamental: like not understanding (and thus not being able to conceptualize and visualize) that the commit graph is a DAG. Not understanding what "rebase" is, because they have no concept of what the word "base" means: rebase follows trivially from that understanding, if they had it.

CLI git has these presentations; it can present the commit graph, though oftentimes companies do such a crap jobs of approaching git history that git is left to render spaghetti. UI tools exist, but face the same spaghetti in, spaghetti out. It's like asking your IDE to make bad code less bad. Worse is people actively promulgate broken "methodologies" that result in these spaghetti graphs, such as the ironically named "A Successful Git Branching Model".


Industry-specific tooling is different than a consumer product explicitly marketed as "intuitive". This comparison doesn't hold, for me.

While I agree that git's UI could be made less cumbersome in some cases, many people in this industry want to use sophisticated tooling without doing due diligence such as reading a manual. Handling code versioning in a distributed manner is not an easy task!

I don't expect much from junior developers, but if I have to fix trivial git problems from more "senior" developers, I will have some disdain for them.


I disagree... git has a fundamental complexity that you can't improve that much on the command line and still have the flexibility and power that it has. I use magit that makes those options a hell of a lot more palatable than they would be on the command line while still affording all the power that you want from pure git. The git-cli has to do everything and there is a fundamental nature of that. I do agree that there are some things that could have been made better with hindsight but that's any project.


Sure can, for instance a hg-like commit stages system for rebase would prevent a lot of rebase caused issues.


> hg-like commit stages system

not familiar with what you're referring to... can you elaborate?


I got the name wrong, it’s phases https://www.mercurial-scm.org/wiki/Phases

In short, every commit can be in one of three phases, secret, draft, and public. Commits transition to public when you push them somewhere someone else might depend on them, and rebase won’t allow you to rebase any public commits.

Commits can of course be manually transitioned back as a “I know what I’m doing step” but this provides a lot of safety for casual rebases.

(Secret just prevents the commit from being accidentally pushed)


It refers to Mercurial, another really popular distributed revision control system.


As a counterargument, git is quite hard, and I see smart engineers make seemingly simple errors frequently. Its documentation is sprawling and verbose, and its UI "porcelain" is frequently terrible.

Did you know you can use `git fetch origin master:master` to update an un-checked out local branch? Go find where that's documented: https://git-scm.com/docs/git-fetch

Spoiler, here's all you get:

> The format of a <refspec> parameter is an optional plus +, followed by the source <src>, followed by a colon :, followed by the destination ref <dst>. The colon can be omitted when <dst> is empty. <src> is typically a ref, but it can also be a fully spelled hex object name.

Just a truckload of jargon. There's thankfully one code sample to give context, but it's at the bottom of the page – nearly 7,000 pixels of scrolling away from the description of it.


> The UI is inconsistent, but documented.

The problem is that the inconsistency means that you need to constantly refer back to the manual until you have a huge amount of arcana memorized.

Not only that, but it's not always easy to find what you're looking for in the manual.

Yes, programmers should all at some point read the first few chapters of Pro Git, and at least skim through the "internals" chapter.

But idiot-proofing goes beyond babying people who don't want to read docs. Even people who do read docs and care a lot about learning their tools make mistakes and forget things occasionally.

> --force-with-lease as a default is a really bad idea.

Agreed.

> Copying random aliases is a bad idea.

Agreed, but looking at other people's aliases to get ideas and borrow useful snippets is a great idea.


I've read the manual several times, but it reads to me more like an in progress design document filled with implementation details, not things an end user should need to even be aware of. I get by fine with 5 or so memorized commands.

I made a list of other version control systems I've used in the past and out of the 7, I can only remember one that was complicated enough from an end user perspective to require reading a manual(ClearCase).


The problem is that it has badly named commands


Such as?


--force-with-lease can be a footgun.

It will overwrite the tree on remote as long as remote hasn't changed since you last fetched it. It doesn't always work, particularly if you have a tool which continuously fetches remote, like an IDE configured to do so such as VSCode. In that case, you will have fetched the other person's changes, and --force-with-lease will happily blow-away anything on remote that might not be in your tree yet.


Yes, but it's still better than --force under most circumstances.


The implication with anything with '--force' in it, is you shouldn't be doing it without talking to someone first.

Absolutely does not belong in automatic anything, anywhere.


Oh, obviously you need to know what you are doing.

I mostly use --force-with-lease to push something to my own branches. No need to talk to anyone.

Force pushing to other people's branches without asking is just rude.


Why use force at all, on a default command?


my flow is (on my-branch with no one else's commits)

* push some commits up to my remote branch

* git fetch

* git rebase master/main to get the latest stuff

* add changes on my-branch that use new stuff from master/main

* git push --force-with-lease to my remote branch - this fails if you don't use some version of force since my most recent commit is based on a commit (from master) not on the remote branch


I don’t get why everybody wants to rebase their topic branches. Just use merge, come on. If you want a „clean“ commit history on main/master do a squashed merge into main at the end.

This way we never had to force anything on the remote.


I rebase on my topic branches because then my edits are neatly stacked on top of the other branch, so I can re-arrange things more easily. Why would I want a weird commit with a bunch of work I didn't do on the topic just smooshed into the middle of my well-crafted series of commits?


Work you didn't do won't be in your branch. There is no rearranging, it is one commit.


> Work you didn't do won't be in your branch.

The merge commit will be in the branch.

> There is no rearranging, it is one commit.

You misread that. They want to be able to rearrange things easily. Having multiple merge commits in the middle gets in the way of that.


The work doesn't show up on a diff, what I was getting at. Merges are from master in this example and already reconciled.

Commits don't matter either, because they are being squashed. Was responding to the grandparent perhaps more than the parent.


Why would you have to force anything with rebase? you rebase your feature branch against main to rewind it on top of it and clean up history so you can do a clean fast forward merge. Squashing is bad for anything non trivial, you want small independent commits: easy to review, easy to revert, easy to blame if something goes wrong.


As soon as you pushed your branch to remote (which I tend to do for backup reasons especially after working hard on a solution) rebase only means trouble.


Not if it's the remote for your own dev branch. It only matters if the remote branch is being (actively) used by other people.

Unfortunately I've met far too many who have your "remote" superstition. I remember arguing this exact point in my last gig, when someone was mad at me for force pushing my own remote branch that nobody else was using nor should have been.


if you're the only one working on that branch I don't see where is the problem in rewriting history and force pushing


Until you make a mistake.


Why? That's what 'git reflog' is for.

Or you mean that a mistake where you accidentally push to someone else's branch?

The default model that public github uses is good for that: everyone works on their own fork of the repo, and makes pull requests to the shared repo. Nobody pushes directly to the shared repo.


Second. Github not an option for a lot/most work.


From https://git-scm.com/docs/git-push

A general note on safety: supplying this option without an expected value, i.e. as --force-with-lease or --force-with-lease=<refname> interacts very badly with anything that implicitly runs git fetch on the remote to be pushed to in the background, e.g. git fetch origin on your repository in a cronjob.

The protection it offers over --force is ensuring that subsequent changes your work wasn’t based on aren’t clobbered, but this is trivially defeated if some background process is updating refs in the background. We don’t have anything except the remote tracking info to go by as a heuristic for refs you’re expected to have seen & are willing to clobber.

If your editor or some other system is running git fetch in the background for you a way to mitigate this is to simply set up another remote:

    git remote add origin-push $(git config remote.origin.url)
    git fetch origin-push
Now when the background process runs git fetch origin the references on origin-push won’t be updated, and thus commands like:

   git push --force-with-lease origin-push
Will fail unless you manually run git fetch origin-push. This method is of course entirely defeated by something that runs git fetch --all, in that case you’d need to either disable it or do something more tedious like:

    git fetch              # update 'master' from remote
    git tag base master    # mark our base point
    git rebase -i master   # rewrite some commits
    git push --force-with-lease=master:base master:master
I.e. create a base tag for versions of the upstream code that you’ve seen and are willing to overwrite, then rewrite history, and finally force push changes to master if the remote version is still at base, regardless of what your local remotes/origin/master has been updated to in the background.

Alternatively, specifying --force-if-includes as an ancillary option along with --force-with-lease[=<refname>] (i.e., without saying what exact commit the ref on the remote side must be pointing at, or which refs on the remote side are being protected) at the time of "push" will verify if updates from the remote-tracking refs that may have been implicitly updated in the background are integrated locally before allowing a forced update.


The existence of such a complicated workaround mainly serves as an official confirmation that you should just turn off auto fetch or not use force-with-lease. I use PyCharm's Update Project to fetch all branches at once on demand, so I will always get anyone's changes to my branch at the same time I fetch.


I guess this will come off kinda... douchey? But I just don't find Git to be that hard. I know there's a lot of complexity there, but I find that 95% of the time I'm just git add -p or git add . and then committing. Every once in a while I'll do a rebase, and that's the most complex part of Git that I use with any frequency.

I remember when I first was introduced to Git I found it confusing, so I'm sympathetic to newbies who have to cross over the learning curve, but I think the main day-to-day Git operations just aren't that much to learn. What is it about Git that people find so difficult, even after using it for a while?


I think (from working with git newbies) the core difficulty is not understanding the Git's fundamental data structure. If you have computer science training / data structure understanding, once you realize (1) Commits are nodes in a graph (2) branches are just pointers to a specific node, you're off to the races.

From there, I find it straightforward to conceptualize all the different commands. Commit/push/reset/merge/pull/rebase/etc are all just different graph manipulations. There's usually multiple ways to achieve the end graph you want.

If you don't understand the data structure, you have no good mental model and it's just a series of ritual. If something goes off the ritual path, you're in trouble.


I don't find it _hard_, but it is very _unopinionated_

So if you mix up one kind of workflow with another, you can really shoot yourself in the foot. There's people that have a merge-based workflow, a rebase-based one, more advanced projects (Linux kernel) have something more bespoke. Getting used to one project, then using those practices in another, can really mess you up.


The UI is inconsistent and unintuitive.


I won't disagree with that, but idk. Once you understand the basic operations everything weird is a Google away.

I'll admit that Git is the only VCS I've spent any time with, so maybe I just don't know how much better it could be. But I've almost never had an issue with Git where I actually lost code. And anytime I'm doing something dangerous, I just make a backup copy of the directory in case I screw up irrevocably. But even if I'm doing something nasty, the reflog is there, and the cases where I've need to use my backup copy are very few and far between.


> almost never ... actually lost code

That seems like one of the absolute basics. "Almost" never...?

> I just make a backup copy of the directory in case I screw up irrevocably

If you really felt you could trust your source control system, that shouldn't be needed, and...

> the cases where I've need to use my backup copy

...should be nonexistent.


I guess what I'm trying to express is that even in the rare case I'm doing something fancy with git making sure I don't lose anything is trivial.

I can think of exactly one time I actually lost code and I was doing ill advised reflog fuckery.

I take your point, but I stand by my opinion that in virtually all normal usage, git just isn't that hard. YMMV.


I agree. If losing code with Git is even a remote possibility, then you're "holding it wrong" indeed.


There are multiple ways to accidentally overwrite or delete a file that wasn't committed yet.

There are also situations where you need to use these commands and you're not ready to commit everything.

The safe way to handle this is generally making a stash commit and immediately applying it, but if you don't know that, or don't think to do that first, the result can be data loss.

Git is very careful about commits but if you never staged something then git can be ruthless.


> Once you understand the basic operations everything weird is a Google away.

Yes that's true, but what you've probably forgotten is that understanding the basic operations is way way harder than it should be thanks to Git's terrible CLI, nomenclature and general UX.

I would also argue that the lack of an obviously good GUI makes it quite a lot harder for beginners, and the attitude that a lot of HN commenters have that you shouldn't use a GUI, despite Git being an obviously visual thing.

Btw any beginners reading this, I would recommend Git Extensions. Terrible name which led to me not trying it for ages, but it's actually one of the best. In particular it lets you browse the repo at every commit which is something that makes it way clearer what Git is doing.


That's true. But they are working on this. Slowly.

Eg compare https://git-scm.com/docs/git-switch to git checkout.

https://www.infoq.com/news/2019/08/git-2-23-switch-restore/


95% of hit usage is fairly easy, yeah. But that other 5% can be extremely difficult.


Nice page! Very well made and great explanations. However; I'm a big fan of aliases in git, but I don't think this is the way to go personally. Knowing the difference between rebase and merge is vital, and I think it's worth the time investment to learn git properly.

It took me 1 month of painstakingly learning the git CLI and I'm happy I spent the time. Before that I was using a GUI and was essentially afraid of git.

I'm no longer afraid. Instead I'm actively exploring ways to do things faster and/or smarter every day.

This happened when I switched jobs recently and came into contact with a fantastic developer and patient teacher. Doing it myself would have taken a bit longer and I wouldn't be able to see how much it would be worth it.

I'm enjoying a rebase workflow everyday. And I especially like the -i flag for interactive rebasing;

  git rebase -i
Same with -p for interactively adding code/files before commits

  git add -p
I read the first three chapters of the git-scm book[0] and it made a world of difference in understanding what's actually going on.

[0] https://git-scm.com/book/en/v2


Idiot proof git is handled by protecting ALL branches and requiring a merge/pull request. Not a fan of aliases unless created and used personally.

I'm a tech lead for 2 large scale web projects. Rarely will I traverse git log for anything besides the last few commits. If I ever wanted to see the history of something I would just lookup the merge/pull request, or look at the blame on individual files or lines within that file. Having a non-rebased commit history at that point is much more clearer on why/what changed.

I guess anyone who would care for a pretty git-log does not have adequate tracking outside of the source code for requirements --> implementation. Guess I'm used to more stricter guidelines, because I could never see myself working on a project where something is committed that didn't derive from some type of identified requirement.


I fail to see what anything you state has to do with rebasing.

Rebasing ensures all commits from a feature branch are contiguous. Rebasing + squash ensures the main branch is not polluted with a myriad of useless intermediate commits no one is interested in. It also allows combining multiple commit message into a more coherent explanation of what changed instead of multiple tid-bits.


yea I do interactive rebase all the time on my local git to clean up my commits before I push them up.

I also started doing `git commit --fixup` when doing cr fixes so I can just autosquash the fixes - `git rebase --autosquash`

So yea I think rebase and merge commits both have their place within a workflow.


I fail to see your misunderstanding of what I stated. Rebasing is all about a pretty git-log, a pretty git-log is all about looking up why/what something changed. Everything I stated is about why you don't need a pretty git-log to know why/what something changed... Hope you don't talk to people at your day job like that.

Squashing commits of a feature branch via rebase is not the same thing as rebasing a local feature branch with the intended remote target.

Anyways, squashing is almost worse. Imagine doing a git blame on a line only to see a large commit of x amount of other changes with a summary about the global feature instead of an individualized commit referencing the line changed more closely. I am interested in those myriad of useless intermediate commits as you put it.


It is an unforgivable crime that Mercurial has been muscled out of the consciousness. It's so much better, on literally every level. Implicit branching + the ability to truly divorce the repo's state from the actual files on the disk (in a simple way, I'm sure you can do the same with git, but it's going to involve a lot of subcommands and flags) are simply too good. Every time I have to use git, I feel like I'm fighting it. I end up coming up with a command that has ten flags and a subcommand that I've never heard of until just now. And when I try to ask people who know better, they agree it's the best way to do it.


.gitconfig:

  [alias]
  add-commit = !git add -A && git commit
.bash_profile:

  function save ()
  {
    git add-commit -m "$*" && git push
  }
Use like:

  $ save This is a commit message
This adds changed files to a commit with this message and pushes it to remote.

Also:

  alias mkpr='git push && gh pr create -d -f -B develop | grep https | xargs printf -- '%s/files' | xargs open'
Open a PR from current branch based on develop and open it in GitHub in my browser.

Merge upstream changes from develop:

  alias mduc='CURRENT_BRANCH=$(git rev-parse --abbrev-ref HEAD) && git checkout develop && git pull && git pull --tags && git checkout ${CURRENT_BRANCH} && git merge develop'
I have a few more, whereby I skip most of the ceremony with git. Works for me.


Commit accepts the -A flag. You can simplify it to just

  function save ()
  {
    git commit -am "$*" && git push
  }


You can. I have it separate for the purpose of composition, sometimes I do 'gac Foo' which stands for 'git add commit' and I push later.


Those are really good! Thanks for sharing. Curious to see the others. Are they available somewhere?


I'm incredibly thankful that 99% of my Git usage at work gets away with just PULL, CHECKOUT [-b], COMMIT [--amend] and PUSH. Rarely do I need to rebase, for any reason.


I understand the sentiment, but since git is probably one of the longer-lasting constants in our industry (if not the longest-lasting constant), I personally think it's really worth to have a bit of a look into it.

Something I wish someone suggested to me years ago: Instead of trying to understand the commands, try to understand the datamodel. A branch is just a pointer to a commit, a commit is just a pointer (with metadata) to a tree, a tree is just... and so on. Once you understood this (which really isn't any harder than, say, understanding how quick-sort works), going from the data-model to the commands is fairly easy, almost intuitive if it weren't for all the convoluted options that each command can take.

Anyway, not trying to convince you or anything, just saying that I've been in your place a few years back, and wish I'd realized earlier how easy it actually is.


Agreed, once you grok the start/end semantics of rebase it is not hard to work with. But it can be intimidating for new users.

I actually think it is easier to teach the fully-specified and interactive form ‘git rebase -i start-sha-a end-branch-b —-onto target-c’ which makes it really explicit what is going on (“snip from A to B and put that chain on C”). When you understand that you can start using the defaults that abbreviate the common cases. (Specifically “end-branch-b” is usually not needed since you usually run the command from that branch.) And getting to this level of grokking requires you to understand the data model mentioned above, but not any obscure internals, so I think it is a good bar for “knows enough of git” for senior engineers in most orgs.


> Once you understood this (which really isn't any harder than, say, understanding how quick-sort works), going from the data-model to the commands is fairly easy, almost intuitive if it weren't for all the convoluted options that each command can take.

So, not intuitive at all?

I've had to make very minor changes to the commit history that took a bunch of obnoxious commands. I know the git model very well but it didn't help. It would have been easier to copy all the source files out, check out the branch I wanted, then copy them all back in.


> It would have been easier to copy all the source files out, check out the branch I wanted, then copy them all back in.

That would be "git checkout -b some-branch" and then "git reset --soft origin/main" (or whatever branch you want to be on top of).

"reset" sets the pointer where you want it to be, "--soft" ensures that the actual files on your filesystem (the working tree) isn't touched. You will then have uncommitted files (your changes compared to the origin/main), that you can then recommit everything the way you want it.

reset --soft is my goto-recommendation for devs who have to satisfy a linear history but don't really care about git-history at all. Just do your changes as you normally would, using merge and whatever else floats your boat, and then once you're done, just use a soft-reset and then commit everything in one single commit. It's of course not ideal (meaningful atomic commits or some such would be better), but compared to having dozens of "fix stuff" and "merge from main" commits, it's definitely better.


You have misunderstood. It's not a case of not knowing more about Git and its toolset. It's about our underlying workflow, letting us do version control of our software with simple means instead of throwing everything at it. It's easy to throw a hammer, mallet, pein and a club all at once on a nail, but that doesn't mean it's necessary or helpful.


> letting us do version control of our software with simple means instead of throwing everything at it

VCS is a pretty difficult topic, and it's not like git is the only tool out there (let alone the very first). Multiple people potentially working on the same file in conflicting ways is always going to be something that cannot just magically be made simple. After all, if there are no conflicts, "rebase" is literally the simplest command there is - just "git rebase origin/main" or whatever your equivalent is and you're good. It's only with conflicts where the fun stuff starts.


I don't understand what it is you want to communicate. The reason we mostly get away with minimal interaction with version control is because we plan our work to avoid (or at least minimize) multiple ongoing efforts in one and the same part of the platform - for reasons that are entirely unrelated to version control itself, not because someone on the team doesn't know how to resolve merge conflicts with Git.


I mentally categorise `commit --amend` right there next to rebase personally. When people debate about rebasing and rewrite of history, I include `--amend`. Maybe I'm unique there though.


The "is it rewriting history?" boundary for me is whether you're amending a local commit or one that exists on the remote.


I personally separate that particular type of revisioning/amending of history (e.g. someone overwriting some WIP commit over and over on their work branch) from rebasing, which to me primarily means reconciling with changes elsewhere in a different branch - figuratively rebasing the changes from over there to here.


--amend is conceptually nothing else than interactive rebase to squash the last two commits together.


Conceptually that's not really a rebase, because you're using the same base.

If you never push the original commit, --amend is the same as staging things repeatedly for safety but only hitting the "commit" button later in the day.


Ha, "conceptually" is perhaps a wrong word. What I meant is that you end up doing the exact same operation, just via a shortcut in the UI.

It's the exact same operation I use interactive rebase for most often in my everyday work, just limited to a single commit (HEAD).


I like those aliases if you are doing for yourself, but I dislike for beginner (to git) using them to make git "easier". Git UX is confusing and has growing pains I won't deny it, but git is widely used and imho one should take some time to at least understand the basics.

I dislike even more the default of many IDE and some git server, they seem to push for the worse git habits.


I feel like the fact that we have these incessant conversations about how to use git just show that the interface is inherently broken. New programmers always struggle with git. The naming of the commands is bizarre and confusing. The docs are overbearing and a lot of the time the fact that it's powerful in weird edge cases is seen to trump all of its flaws. That said I haven't seen anything better or even close to its usefulness.

Lack of any real competition may be an issue there.


My solution: use mercurial.

Works with git repos (w/hg-git extension). Designed for humans. Sanity provoking. Gets out of your way, as a good SCM tool should.


Don't use this, just learn how git works. Bookmark this page and refer to the commands in the aliases if that helps. These commands are quite opinionated and as likely to cause you to do something you didn't mean to do, possibly destructively, as they are to help.

What happens when you have a million conflicts trying to rebase and you need to give up? You need to know how to abort a failed rebase or merge. You shouldn't insist on using a tool if you refuse to learn the basics of using it.


You need to know how to abort a failed rebase or merge

g abort a failed rebase or merge?


I might be mistaken but this post is basically describing git-friendly which I've used for years, is 100% flawless, and you'd need to pry from my cold, dead hands.

https://github.com/git-friendly/git-friendly


I know HN will absolutely tear me apart for recommending this, but I use GitHub desktop. It has all the bells and whistles of the CLI, but you can actually see and understand what's going on.

As a Junior Engineer, a Senior Engineer recommended it to me. I thought he was joking at first, but he kindly reminded me that using a GUI app is completely fine and okay. We shouldn't stigmatise tools that make it easier to use and understand your workflow. GitHub desktop allows me to see what i'm committing, commit history and much more. I'd definitely recommend it.

PS. Using it is not an excuse not to learn how to use the CLI. Others will accuse me of being lazy and not learning best practice. Learn both. Use the easier one.


The reason people on here tend to dislike those types of tools is because they've probably been the ones who had to fix the tangles people get themselves into by using those tools. Tools that obscure details in favor of simplicity are fine in some cases, but version control is an inherently complex problem domain where having those details is important.

In my experience mentoring juniors new to git, those who are just given a few basic commands don't create as much of a mess because they don't have the commands required to make a big mess. When they do make a mess, it's usually because they've copy-pasted something from the internet, and most will acknowledge that blind copy-pasting _feels_ wrong. Give them a GUI, and suddenly there are buttons to create all sorts of unholy mangles right at their fingertips.

As a really simple example, one of the most frequent things I see from juniors using git GUIs is adding files to a commit that they didn't intend to (say stuff that's not in the .gitignore, but doesn't belong in the commit). In the CLI, they probably don't know about -a, so they would be forced to add files/directories individually and think about what to include. Most GUIs I've seen include a "Stage All" button front-and-center, which is very tempting for a new user to click (or, worse, they make staging an opt-out thing). I do not know if this specific example is the case in GitHub Desktop, it's something that I see regularly.

I agree with your last point. I think git GUIs are best for users who already know what they're doing and find that a GUI speeds up their workflow.


> As a really simple example, one of the most frequent things I see from juniors using git GUIs is adding files to a commit that they didn't intend to (say stuff that's not in the .gitignore, but doesn't belong in the commit). In the CLI, they probably don't know about -a, so they would be forced to add files/directories individually and think about what to include. Most GUIs I've seen include a "Stage All" button front-and-center, which is very tempting for a new user to click (or, worse, they make staging an opt-out thing). I do not know if this specific example is the case in GitHub Desktop, it's something that I see regularly.

I agree with your general point, but on this specific point I'd be remiss if I didn't mention that I see an insane amount of people regularly use `git add .` to add every file, because they don't realize they actually want `git add -u` (only add already-tracked files) 99% of the time.

But as a counterpoint, since your example is about giving devs a limited set of commands, you naturally wouldn't be giving them `git add .`. But it's definitely something that frequently comes up in crappy git tutorials.


You didn't mention the fact that the GUI tools often lack functionality that's present in the CLI. Like git notes (looking at you, GitHub, who previously HAD notes support, but dropped it) or worktrees (Sourcetree lets you view worktrees, but provides no interface for interacting with them as worktrees, rather than just another repo). Or they hide basic functionality, like git ammend, or give less useful error handling advice than git itself (Sourcetree, I love your commit graph, but your error messages are worthless).


  git add . 
  git commit -m ""
Are like the first 2 got commands anyone tends to learn after git init or clone


> they probably don't know about -a

How are you going to keep them from learning about -a?


When I first learned git, I had no idea what staging meant and the differences between A, M, D, R, AM, etc...

Only some time later (albeit a short time) I learned about -a, at which point I had a bit more understanding of the statuses and what it meant to stage a change. If you gave me -a before that, I would have never understood those things properly.


the other thing is that this likely doesn't work over ssh


It does. GitHub has been soft-deprecating HTTP for years now.


> Learn both. Use the easier one.

This is a great point that I will be sharing with my team. Sometimes (most of the time) I use the git cli, and sometimes I use the built-in Git pane in VS Code.

I have not used GitHub Desktop in quite a while. In your opinion does it make the commit graph easy-to-read? Because I have not found a tool _yet_ that makes that diagram easily parseable by the human eye. It just looks like a mountain of spaghetti commits linked together with a myriad of colored lines.

(I realize that part of the cause of my confusion is crappy discipline around commit history, but what can you do on a team of a certain size where you're not the lead? Just have to suck it up.)


Github Desktop doesn't have a commit graph. It only shows a list of commits for a selected branch.


SourceTree does well in this regard.


Sublime Merge here, love it and won't go near the CLI anymore


Seconding Sublime Merge. I've been using it since it was launched and it has been a solid tool.


> It has all the bells and whistles of the CLI

Somehow I strongly doubt it. Does it have at least access to cherry picking, reflog and rebasing?


Yes you can rebase, and drag-and-drop commits to cherry-pick. Not sure about reflog, but I think the visual interface sort of replaces some of the need for it.


OP's post describe the exact command git would use, and explains why, which is great for learning purposes as well.


For me this is another painful reminder of what git could have been. Imagine if it had been this ergonomic from the start. Everyone would have the same set of commands. Whereas if you adopt these aliases you can't easily discuss any git related issue with someone who doesn't use them.


I understand generally what Git is doing, but when you start throwing in very specific words like "rebase" my eyes start to glaze over. Not because they aren't important concepts, but because I can't stop the nagging feeling that it shouldn't be this complicated (it probably should though).

But it isn't complicated! When you use a decent UI tool. I know pretty much exactly how VSCode's UI behaves with Git, along with the Git History plugin. And now I have a perfectly usable workflow and never have to use the command line, and almost never have any merge conflicts.

Maybe this is common knowledge to all of you, but it seems to me a lot of people have Git headaches, so maybe they don't know this?

1. Pull latest from remote main branch (select the branch, hit the sync button; permissions structure prevents any accidental pushes)

2. Create new branch based on pulled branch. If I have stashed work, apply it now.

3. Do work

4. Use the + icon to stage individual changes and give them their own local commit

5. Push my local branch to remote (hit that sync button again)

6. In my remote repo UI push the button to create a Pull Request (I assume all shared repos have a PR button somewhere?)

7. Go back to doing local work, but don't commit yet, and continue "not committing" for now

8. After the PR has been merged into main, stash everything including untracked (pick it from the VSCode menu)

9. Go back to step 1, repeat

Steps 7 and 8 are how you avoid creating your own merge conflicts. If you keep committing while a PR is in flight, then later on Git will see your earlier changes to a file, and your later changes to the same file, and will see them as a conflict. So just do yourself a favor and never commit to a branch while anything in its history is part of an in-flight PR. Obviously you'll have to deal with merge conflicts with other people but at least you aren't creating your own headache.

Git History saves me when I goof up and commit locally when I shouldn't. I just pull up the history and click the soft reset button prior to my commit and presto, I'm back where I want to be (files uncommitted, back in pending changes list).

Maybe one day I'll learn the Git command line, but right now why bother?


Stash -> pull -> unstash is just manual rebase, though. You're already doing the thing you're claiming not to do, you're just doing it the hard way.

Which is fine, if that works for you! Just know that you're using different terms for the same thing (do some work on top of A, then move it to be on top of B instead).


Now you've piqued my interest.


Not the person you replied to, but what a rebase is is taking a series of commits you made to one place, and replaying them on top of another commit. So you make your PR, then branch off of that commit, and continue doing the normal edit-commit workflow on that new branch. Then, when you the PR gets merged, you git rebase -i (master/main/whatever branch your PR was being pulled into).

You'll be presented with all your commits, you can choose to exclude some from the rebase (useful if you had to update something specifically because it gets updated between every PR/build), then commit to the rebase. It will then proceed to replay each commit you made, just like a repeated version of your stash/unstash process, on top of the new top of the master branch. Your working branch is now contains all the changes you made, but instead of working off your PR commit, it's like you were working off of the finished PR commit.

If you want to make sure that your git history maintains the old branch for sentimental reasons (or you want to try out removing some commits and desire an easy way to rollback if you get lost), just make a branch off of your working branch before the rebase. That branch will be identical to the branch you were working on, before the rebase.

                       A---B---C topic
                      /
   PR Branch P---R---S  
                      \ 
               O---L---D---E master
(where commit D and E are produced as part of the PR, say, the merge commit and a version number crank or something)

(checked out topic) git rebase master

   PR Branch P---R---S       A---B---C topic
                      \     /
               O---L---D---E master
magic


This is an excellent explanation, thank you.

Still, I have questions. When I have topic checked out then run "git rebase -i main" does it first do fetch (or whatever) to make sure main's latest commit history is represented? Or is it up to me to do that first?

The rebase concept is sound, but the semantics are a bit daunting.

  git rebase -i main
This sounds like an operation is being performed on main. Terrifying! And "rebase" is a scary word choice regardless. Perhaps you remember when you were new to Git and can relate.

I may give rebase a try at some point, to see if perhaps it might speed me up. The vast library of Git commands is a bit much for me though. It's like using an aircraft cockpit to control the television!


There's also `git pull --rebase`, which is just "pull like normal, then rebase onto the thing I just pulled".

Remember that it's okay to screw around and break stuff! Sometimes that's the best way to learn. Git's whole thing is tracking history, even the history of the history itself[0]. It is hard to accidentally lose something if you've committed it at least once.

If you're nervous, push your branch somewhere before starting (it's not strictly necessary, but it's the easiest way to get peace of mind).

Nothing you do locally will be pushed back to the remote unless you explicitly do `git push`. If you manage to screw up your branches so badly you don't know how to recover, you can just delete the local branch you broke and re-pull the remote one like it's a new branch, even if it's main/master.

[0] https://ohshitgit.com/, especially the first thing there


I definitely sympathise with the confusing and scary command names! "reset" is the worst in my opinion. Rebase is kind of the natural choice once you understand the data model, but it doesn't necessarily make it more approachable.

`git rebase main` will modify your currently checked out branch to make the commits on that branch now branch off of the current value of main. It won't update main or modify it in any other way, only your current branch. You can equally `git rebase 347ae9` to have them come off a specific commit.

If you know what a tree is (in the general computer science sense, not the git term of art) it's well worth taking a little time to learn the underlying data model in my opinion.


Thanks, that's useful information!


> It's like using an aircraft cockpit to control the television!

It's not a television though. It's a set of tools used to manipulate a tree-like data structure that you're using to store your versions and reconcile changes across multiple trees. Once you approach it with that in mind and can imagine the high level data structure in your head, using it becomes almost as natural as using a hammer for a nail.

Rebasing is nothing more than changing the location a pointer ("branch") points at and then moving some nodes from one tree branch to another (in exactly the same way cherry picking works). It's really simple to visualize, you just need to have that tree in your mind already.


Soft reset is a rebase too


What's the deal with squashing commits anyways? I'm genuinely asking, because I've only worked with "squash everything before you put it up for review" but have never really figured out why past "it's what we've always done".


I think people will have a branch with like 40 dumb commits like "fix typo" but when you merge it, people don't want to go through all those 40 commits in main. They want to instead read "Refactored frob reactor to use glom framework".


merge commit with git log --first-parent is alright.


In my experience the squashing crowd is much louder and cares more about squashing than the crowd who prefers to see every commit. I hate squashing, but I've been beaten down into doing it many times because in the end it doesn't actually matter.

The merge commits everywhere argument falls apart if you use log --no-merges. The shitty commit messages argument is solved by not allowing shitty commit messages. The "fixed typo", "oops left TODO", "another typo" chain of commits argument is solved by telling people to not do a million commits like that and IMO those should be squashed. The clean history argument is solved when you don't allow the useless "fixing typo" commits in feature-xyz branches. Your history should be clean and contain a history of the progress that was made.

But ultimately the easiest way to solve all these problems is to force everyone to squash. You only have to police people on a single commit and people don't have to learn about options like --no-merges.


> is solved by telling people to not do a million commits like that

> don't allow the useless "fixing typo" commits

This sounds like a much much more heavy handed approach than: I don't care what you do on your feature branches, just squash your commits to master and write a nice commit message explaining what you did.

IMHO, all your rules do is increase inertia of people to fix typos.


It's a bit heavy handed, but I believe the rules are more like best practices.

If you're fixing typos in new code I see no reason to have the history of introducing the typo and then fixing it. If you've come across a typo in a file you're editing then by all means make that typo fix in its own commit.

It gets worse when you start fixing typos in files unrelated to your fix. One of those small typo fixes could introduce a bug and multiple commits makes it easier to track down. Blame would pinpoint it exactly - you'll see Bob's "Fix a typo" commit instead of being buried in Bob's "Add trucks to the game" commit. That saves you from having to go look at the PR or diff and to figure out why Bob renamed cares to cars.


> It gets worse when you start fixing typos in files unrelated to your fix.

That does not go into the PR and someone should make you split that out into a code-cleanup PR and it shouldn't pass review.


> I hate squashing, but I've been beaten down into doing it many times because in the end it doesn't actually matter.

For those of us on maintenance teams, who actually have to dig in to the history to figure out what happened, not squashing matters a lot.


You'll have to elaborate with specifics because in my experience it doesn't matter if there's 100 or 10000 commits - git bisect works great in both instances.


It works, but it works better when you have the original 10000 commits. You can tell exactly what the committer was attempting when the bug was introduced. It may have been as a fix to something else, it may have been a typo when linting, it may even have been been intentional and the bug report is wrong.

Other comments I made on another recent git post:

https://news.ycombinator.com/item?id=33395616

https://news.ycombinator.com/item?id=33392624


None of the opinions described in this thread can fix the problem of ultimately having too many or too few commits, whatever norms are enforced. Squashing errs toward too few, not squashing errs toward too many. Probably the ideal is that a PR has meaningful, thoughtful commits which each contain a coherent, self-contained increment of functionality.


Hah, I must admit I thought you meant that you preferred squashing! Agree 100% that a huge number of small commits are easier to find bugs in than a small number of huge commits.


I meant my response more to the last part of what I'd quoted, the "because in the end it doesn't actually matter.".

Over the past few months we've been trying to formalize our git usage since we're split across a bunch of teams, and on this particular issue I've found a very hard split: People on the new-feature-development teams love squashing and squash-merge, people on the maintenance teams who have done work in git repos for a while are almost all against it (of the ones with no opinion, several are still only working in svn repos so haven't had a chance to form an opinion), but most people on the maintenance teams don't really speak up about it so our company-wide guidelines remain heavily in the squash-merge camp.


It may be the last thing they did on this day and the commit message reads "wip".


And if that's where we end up while bug-hunting, we know it was caused by an unfinished thought, is akin to the typo case above, and should be fixable without much concern over what else it may break since it wasn't introduced while fixing something else.


That's exactly why I like multiple commits similar to how I described. Maybe this is the silver bulletpoint that can convert squashers? https://news.ycombinator.com/item?id=33536876

>One of those small typo fixes could introduce a bug and multiple commits makes it easier to track down. Blame would pinpoint it exactly - you'll see Bob's "Fix a typo" commit instead of being buried in Bob's "Add trucks to the game" commit. That saves you from having to go look at the PR or diff and to figure out why Bob renamed cares to cars.


Bisect will let you find the big squash commit that caused the regression, but lots of the information about why the change happened was lost during the squash. Often it's sufficient to just know the broad feature that the change was in aid of, but you're out of luck if you're diagnosing anything subtle. Bisect works better with smaller commits.


So there's (at least) two uses for commits.

The first is to keep a log of what you are working on. For me, that's lots of small and dumb commits.

The second is to provide a story for review.

Most of the time, when the change you are putting out for review is small and simple, you can just put it all into a single commit.

Sometimes, your change is more complicated and it makes sense to break it into a series of related commits. Along the line of 'first do the refactor that makes the complicated change simple, then make the simple change'.

Your job as an author is to hide the ugly reality of how you actually came up with the change, and present the reviewer a sanitised view of reality. Reviewing code is hard, so it's best to make it as easy as possible.


Why wouldn't you use a local branch (or something?) for your ugly commits and then merge everything from that branch into the main shared branch when you're done?


Because that's hard to review.

I am doing a lot of exploratory programming and way point commits.

I don't need the reviewer to understand all the mistakes I made and bad designs I considered. It's enough work to understand the finished design.

As a rule of thumb, every commit that lands in master's history should build and pass tests. So that eg 'git bisect' works.

But it's not a good idea to put that same requirement on waypoint commits I make along the way, when exploring.


Because then you're going to merge in the ugly commits and make everyone who needs to look at the history in the future have to work that much harder to understand what's going on.


And you aren't just making the human's job harder, but also tools like 'git bisect' work better when every commit in master builds and passes tests.


So you can get a semantic and useful history before submitting a patch/PR for review (or before merge).

Doing hacky and messy WIP commits is totally fine. Just clean them up before having others read them.

In case you're not aware:

  git rebase --interactive [remote/]ref


There are essentially two approaches:

- History is sacred and can't be changed. It typically goes with a merge-based workflow. The good thing is that nothing is lost, what you see is what really happened and you won't make a mess with inappropriate push --force. The downside is that it looks messy since you are going to see every typo and you will have some non functional code in your history. People using that workflow usually don't squash since squashing rewrites history.

- History is like documentation and have to be nice and clean. It typically goes with a rebase-based workflow. This is what the article is about. The good thing is that every commit has working code, and git log can effectively replace a more formal change log. The downside is that you lose information and the git log doesn't represent what really happened, furthermore, since push --force is often used, your local branch may not be the branch you think you are on, you may even end up destroying other people commits. People using this workflow usually squash to make their commits nicer.

I prefer the "history is sacred" workflow myself, but both options are valid.


It's ironic that the "history is sacred" crowd contains the people least likely to actually look at and use the git history to aid their contextual understanding of the codebase. Why do I say that? Well, because once you've come across a useless typo commit for the 12th time, you start to very quickly see how such commits needlessly bloat the history and make it far harder to understand (and make things like git blame much more unwieldy), while not actually offering any benefit.

It doesn't matter to me at all 3 months later to know that you initially had a typo in your first commit and then fixed it the subsequent commit. That should just be a single commit.

There are some very rare edge cases where having the whole messy history can offer some insight of how a certain mistake ended up being made, but IMO in 999/1000 cases that's not the case.


Squashing and rebasing don’t fix the shitty commit message problem. Discipline and good teamwork do.

All to often in rebase+squash heavy repos I start going through the history and find the mega-commit that introduced a change and gain zero insight. In comparison, merge heavy repos have a lot of merge this, or fix this typo commits, that are fairly easy to skip right over when not pertaining to your actual change, but prove invaluable when the problem was introduced by some weird bad conflict resolution including a typo, or countless other problems where just all context is otherwise lost.

I think it’s unfair to characterize one group as not using the history, but instead represent differences in how groups use them. As an archaeologist vs daily change log perhaps is a better characterization?


That's why when you use a "history is sacred" approach, you typically have several branches and use merge commits, not rebase.

When you look at the master branches you will mostly see merge commits, all the mess will be on the side, for example in feature branches. Merge commits can be clean, and when you are blaming, you will see the merge commit, not the dozen of typo commits. If you do "git log --first-parent" you will not even see the messy commits, but they will be there if you need them.


Projects should rewrite their public history from time to time, for instance to get rid of commits that never should have happened, or move changes between commits, fix commit messages and such.

The history should be treated as the perfect object to develop. We should continuously revise how this program should have been developed to its current state, by what sequence of changes.


While I think a rebase workflow where you rewrite the history of feature branches is a reasonable approach, rewriting public history is a big no for me. Maybe it is fine with some version control systems, but certainly not git.

Git is essentially a blockchain, rewriting history is like rolling back a bitcoin transaction, it is impossible without breaking cryptography. Instead, when you rewrite history in git, you actually create a new project to replace the old one. You then have to tell everyone to switch to your "new project", otherwise you will create a lot of confusion and painful merges, if they don't decide to say "fuck you" and keep working with the original history.

Git is decentralized, and to work it needs some consensus, and the consensus is in the history, if you break that, you break the decentralized nature of git. Now, if your don't open your repository to the general public and use git like you would use svn or other centralized systems, it is fine, but don't do that for public repositories.


Rewriting public history is completely fine with git.

Upstream says: "master is now this SHA, deal with it".

Actively developing downstreams can easily rebase their stuff (if any) across non-fastforward changes, and life goes on.

For a pure consumer of a repo, it makes no difference.

There is only the cultural idea that it's a no-no, not a technical idea.


fix typo

yet another typo

test fix

aaah, why is this test failing?

merge foobar/narf

revert foo

foo

Is just not a good history to preserve


Boggles my mind how so many people in this thread are against rebasing. Have they never needed to remove these commits? I do that pretty much daily, especially when troubleshooting something in a CI pipeline.


I prefer rebasing too, but merging doesn't mean that you have to merge in all noisy minor fixups, you can combine the strategies and rebase the feature branch into a set of sensible commits, then merge that branch. Preferably even with a CI that checks each commit before merge such that it will not cause issues later during eventual bisections. --fixup and --autosquash are handy for such workflows.


Using interactive rebase in that way is a great tool, but we're talking about people that are against rebasing in general, so it's not the most relevant point IMO.

i.e. the anti-rebase people are usually of the "never rebase" flavor. Because they're also the "history is sacred even though I never look at history because if I did I would get dizzy and throw up" crowd :P


So that atomic changes are consolidated in a single commit/PR.

Sometimes, the flow for a single feature A: - commit change AA - commit change AB - fix change in AA - extend AB

In this case, until merged into main/master and to reduce noise for the reviewer (especially if using a code review each commit type process), its best to squash all commits into a single "feature A".


Rebase should never be used. Or, if it is used, it should be treated as a dangerous thing to do that’s well outside the norm.

Most of the arguments in favor of rebase are by people fanatical about having a git history organized just so. It’s not worth the headache and effort. PRs are a better unit of work than commits in practice.

Configure GitHub or whatever you use to squash merge only and you’ll be good.

Since moving to this workflow I’ve had zero issues losing data due to a confusing git situation.


So you're the one making me code-review 10000-line PRs because you just dumped your WIP branch — with three PRs' worth of code, plus formatting changes — directly into a PR, rather than factoring apart said WIP branch either during or after the fact.

The designed unit of a (distributed) git workflow is a patch — i.e. a locally rebase-squashed set of cherry-picked commits from development branches, with `git reset --soft` + `git add -p` (or even `git format-patch` + manual editing) used to prune the patch to a minimal size. Everything you do in your local repo should be with the intention of producing readable patches for code review (whether that patch is then done via PR or mailing list.) It's not about creating a pretty history (retroactive); it's about making it easy on the people who will discuss and reformulate your changes, one at a time, before accepting them upstream.

To be clear, you can do whatever you want when your git workflow isn't distributed (i.e. if you're committing to only your own private projects, not proposing changes to other people's projects.) But if your workflow isn't distributed, then why be opinionated about git? You can simplify your life at that point by using something with central-repo-oriented semantics, e.g. Subversion. There's no rebasing in Subversion. :)


If you regularly need to review 10000 lines of code per PR your dev workflow is seriously broken. It‘s got nothing to do with git and its implementation‘s complexity.

Sometimes features do require large changes. But usually you can break a feature into different parts (e.g. database, backend, frontend) and merge them in separately.


Not regularly, no. But sometimes a feature change requires a dependent architectural change — a refactoring of the internal library code that the feature will be implemented into. Or sometimes the language of choice doesn't have a pre-commit-hookable CLI auto-formatter, only an IDE auto-formatter, and the dev editing a file triggers formatting changes to be applied that should have been done in a previous change. And sometimes, a dev thinks it's a good idea to change the representation and decoding logic for a data file or embedded data-structure literal at the same time that they're adding an entry to it (usually because they can't represent the added item's additional semantics without said change.)

> But usually you can break a feature into different parts (e.g. database, backend, frontend) and merge them in separately.

So you've already written all that code, because you couldn't get anything to "work" for end-to-end testing until you wrote all parts of it. The patchset as a whole is inherently large.

Now what? How do you "break [the] feature into different parts" when it's already all written and committed on a WIP branch?

That's right: cherry-picking and rebasing.

The GP is arguing against bothering with this process. Presumably because git-rebase(1) is unintuitive to them, and they don't realize that you should start this workflow with a copy of your branch, or a new branch with cherry-picked commits from your WIP branch, to guarantee non-destructive rebasing. Like making a copy of a layer in Photoshop. (Yes, you can always restore your branch from the reflog, so it's technically always non-destructive; but `git checkout -b foo` is something you learn in Chapter 1 of the Git book.)


Nobody should have long running WIP branches with these massive commits full of unrelated stuff. That’s an antipattern.


Disagree.

No one maintains change logs in their repos most times, so a linear git history where you rebase existing branches on top of their base branches allows for a clean commit history on new features to be merged in which can then be squashed down for a linear commit history on the trunk branches.

Then you can use things like bisect, and just... ya know, read through your change log when you need to.

Shoot, you can even add a few details in the notes while you're at it. How about a link to the ticket and PR at the very least with some notes on implementation.

That's my approach, it really doesn't take much time.

But hey, if civil engineers had the level of rigor software engineers do we'd all be dead.

So you live with what you've got, do what you can on your own branches, and just accept that no one cares about having a clean git history cause we can't be bothered as a profession to spend a couple hours learning how one of the most important tools we use every day works.


I think you misunderstood my post, if you squash merge as I suggested your main branch is linear as with a rebase. Your PRs and the the working branches behind them should just use merged however. Come merge time the diff is turned into a single commit


Well of course it's as good as a rebase -- it is a rebase.


If you squash into a single commit upon merge, ignoring for the moment the fact that as a blanket rule that's a bad pattern, you've now eliminated one of the core arguments against rebasing. The merge commit adds no value if the branch itself is a single commit. Just rebase your squashed-into-one-commit branch ontop of latest master and push that to master instead. Now you have one commit representing your whole PR, with no pointless merge commit.

I really discourage the squashing upon merge approach entirely though, because that's just a bandaid for lazy and/or misinformed developers to cover up the fact that their whole git workflow is completely borked.


Seems you don't understand merge commits, they are nothing special.

Just don't: https://news.ycombinator.com/item?id=33518496


Your perspective is one I've only recently come to understand after migrating a team to git and being the "source control guy."

The lesson I learned was: Prescribe everything about the workflow because nobody is going to learn git.

All the nice flexibility of git just becomes risk. By the time you have enough structure in place, you're back where you started: rigid source control, and you're using git locally on the sly.

The only person who bothered learning git well was a summer intern. And he mastered it, so I remain frustrated.


For what it’s worth I know git fairly well and have used more git strategies than most. I just happen to have found that simple usage actually works better for me personally and most teams I’ve been on.

Knowing a tool also means knowing what not to use:


Like an acceptable subset of C++.


> Since moving to this workflow I’ve had zero issues losing data due to a confusing git situation.

Nobody who understands git will ever lose data, because once committed you can never lose it (it's in the reflog). Indeed, even just adding a file means you will never lose it, although it's not as convenient as having an actual commit.

So yeah, you kind of revealed the anti-rebase case quite tellingly there. It's for people that understand git so poorly that they regularly shoot themselves in the foot and lose work or make other similar mistakes.

> Most of the arguments in favor of rebase are by people fanatical about having a git history organized just so. It’s not worth the headache and effort. PRs are a better unit of work than commits in practice.

PRs and good git commit history are not mutually exclusive. But there are many drawbacks of trying to make PRs themselves your source of truth. A big one being that it's not actually stored in git, so if you ever migrate from github to gitlab or some other system, that context is gone.

> Configure GitHub or whatever you use to squash merge only and you’ll be good.

See, now this becomes even more absurd. What's your fear of rebasing if you're going to do the equivalent of a `git rebase -i` upon every merge anyway?

This is a very confusing and nonsensical ideology.

For those who want to improve their grasp of git, I highly recommend https://git-scm.com/book/en/v2. That book changed the game for me, because I finally understood how to visualize git history in terms of the DAG, and furthermore learned about how git actually works under the hood (blobs and the like) which made me confident I would never lose anything I've ever added/committed ever again.


> A big one being that it's not actually stored in git, so if you ever migrate from github to gitlab or some other system, that context is gone.

Request is committed to the repo on acceptance. Closed are typically useless.

> What's your fear of rebasing if you're going to do the equivalent...

The system takes care of the details without incidental complexity or errors.

> This is a very confusing and nonsensical ideology.

Pretty simple, folks are trying to get shit done. Not screw around with tools. One or two clicks where someone else did the hard work correctly wins every time.

From today, design is important:

- https://www.ncsc.gov.uk/blog-post/so-long-thanks-for-all-the...

- https://news.ycombinator.com/item?id=33531560


> Pretty simple, folks are trying to get shit done. Not screw around with tools.

I'm trying to get stuff done, not screw around with the Github UI. `git pull --rebase main` beats clicking around in a browser.


I use the cli as well, the clicks above refer to clicking the squash checkbox on a merge request in gitlab. This is 10x faster than hand-crafting an artisanal one to tell a story.


This is like throwing away 90% of usefulness that git provides you. That's what you get if you don't wish to spend some time learning one of the most important tools in your career.


People don't want to learn git because it's a bad tool. There are better source control systems, that are far easier to reason about, but they don't have the proliferation that git does.


That's nonsense. Git is the best version control history of all time. It has some regrettable UX difficulties, but as far as the system itself, there is no better decentralized development tool.

There's a reason git came into existence for linux kernel development. The linux kernel is a project so massive and so decentralized that it needed a fitting tool to be able to tame the chaos. And git did that perfectly.

Out of curiosity though, what to you is a better source control system?


Can you expand on what that 90% is? I'd guess more the other way around.

Squash merges are perfect to me for the bulk of PRs- atomic test-passing iterations on the working product. Exactly what I want to see in my history. Useful for bisection. Good for reviewing line-based code changes as I can find all the related work for that feature.

They don't seem appropriate for long lived feature branches, or merging into release branches, but those aren't really being discussed here.


Forcibly squashing PRs just loses information and doesn't bring any benefits in return.

In my experience, only the simplest PRs boil down into what's logically a single commit. Many PRs are simple, sure, but often you end up with bunch of logically connected atomic changes instead.

Let's take Mesa, an established and fairly high quality project, as an example. Look at its open MRs.

You can find bunch of single commit MRs, but some of them consist of approx. 2-4 commits, all of them with proper commit message. See for example https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19... or https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19.... It wouldn't make any sense to squash them when merging.

You can also get monster MRs consisting of 10-20 commits, like https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19... or https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19.... Splitting them out into separate MRs would serve nobody, only increasing noise and review turnaround time. Squashing them would lose a lot of useful information that you definitely want to retain for things like blame or bisect.

Now, Mesa actually doesn't utilize git as well as it could - it doesn't encode the commit's relation to merge request in any way other than commit message (`Part-of:` tag). Personally, I would merge a rebased branch using `git merge --no-ff` option, which would create a merge commit. This way, you get best of both worlds: by using tools like `git log` or `git bisect` with `--first-parent` flag you get what's essentially a list of merged MRs, filtering individual commits out; and if you don't add that flag, you get every single individual commit considered, useful for stuff like `blame` or single file `log`.

Also, before pushing a MR for review, my work branch is usually a mess. Lots of poorly divided up commits, without proper commit messages, sometimes undoing each other. `git rebase -i`, with squashing and rewording, is part of my everyday workflow. It allows me to use git as my personal undo and "let's-try-it-in-CI" tool without putting that baggage onto the reviewer. I get to be as messy as it's useful during my work, and the reviewer gets properly curated list of commits that's ready to be merged into the repository as-is. It's a win-win.

Not using rebases when working with git is fine when you work alone or when you're just learning how to use git, say, during a university project or internship. Otherwise, you're doing yourself a big disservice if you don't put that tiny effort into getting comfortable with tools you're using every day in your work.


I really like squash merges because then you know tests passed at every commit. Makes bisect easier, and thus more likely to be used. And no headaches when you can't rebase or you screw up a rebase, which will happen.


> I really like squash merges because then you know tests passed at every commit. Makes bisect easier

Here is a script (just 3 lines) that tells git bisect to ignore commits in the feature branches, so you can bisect only the commits (usually the merge commits) in the main branch. Best of both worlds.

https://quantic.edu/blog/2015/02/03/git-bisect-debugging-wit...


Not shared much opinion by something like half of all git users likely... as any absolute opinion. Also can only counter the insults by responding: you are a fanatic by using git for backup and work tracking, which in contrast to you, I wouldn't condemn you for ;) everyone/team can use a tool for what fits best and is their use case.

So far I had never lost data ever with git, just confusing git situations from people who never rebased, but still shoot into their own foots with duplicate commits added by helplessly merging around multiple branches too much, all with much too many "fix typo" commits and then unresolvable conflicts :D

(Oh, and btw, when people end up there and I'm asked to get them back to sanity, it usually involved rebasing or cherry-picking them out of their messes).


...here we go again.


> Most of the arguments in favor of rebase are by people fanatical about having a git history organized just so.

Seems to be a bit of an OCD compulsion.


Rebase is fine as long as it's your own unshared work. The alternative is https://xkcd.com/1296/


Don't rebase main.

Don't rebase shared branches.

It's amazingly powerful at making clear annotated changes. And removing small fixup commits for work in progress.

When someone says "never do x" there's probably a missing understanding of nuance.


THIS. Rebasing and squashing local/private changes allows for easier experimentation/rollback while you're implementing. BUT, rebasing anything shared will cause pain to others and should be avoided.


If having a nice history isn't important, and if PRs are a better unit than commits, why squash?


How do you lose data from a rebase?


A rebase creates new commits from old commits semi-automatically. Git then has no permanent record of the old commits, and even if you want to get back to them right away it requires some delicate git surgery.

This is why you can't generally share work using a rebase workflow.

It is not a big deal in practice in most every case, but in a version control system it is a little bit odd that rolling back such a fundamental operation isn't a first class feature.


> even if you want to get back to them right away it requires some delicate git surgery.

The reflog tracks rebase commit history. No surgery required.

>This is why you can't generally share work using a rebase workflow.

Rebase of public facing commits is not discouraged due to data loss. It is discouraged due to the possibility of someone creating change sets off the published work, and then the update re-writing the history to change the merge-base, requiring a re-merge of their changes.


They might be exaggerating a bit the amount of work to recover it immediately, but resetting a tag to a commit in the reflog might be considered at least a minor git surgery. And that’s assuming no public pushes have been made. Then all bets are off and it’s surgery’s time.


> and even if you want to get back to them right away it requires some delicate git surgery.

`git reflog` to get the old commit ID, and then `git reset --hard <commit>`. Seems more like "basic everyday git operations" than "some delicate git surgery".


Many past confused teammates of mine I've dropped in to help would disagree. reflog and reset, for better or worse, require what seems to be above-average comfort with git.


I don't think you can call yourself "using git" if you're not comfortable with such simple concepts. Pointing a branch to a specified ref is one of the most basic operations you can reason about when using git!

I'm perfectly aware that many people don't think when using git at all and instead merely copy'n'paste memorized commands hoping that they'll do what they want to accomplish, but this is something you should move past when you want to stop calling yourself a "junior developer". This is one of the most important and helpful tools in your field of work, you can either take advantage of it or suffer.


[flagged]


I use rebase, for two reasons:

1. (major): it lets me commit at my pace, and then publish my work at the pace the team prefers (which is a combination of what the team wants and what won't choke the CI, but that's a story for another time). The ratio of my local commits to commits sent to code review is often 5-10:1.

2. (minor): because I hate to see a Git history that's over 50% made of "Merged X, Y into develop". It's pure noise. I guess there's probably a command line flag to filter them out, though.


That's because they know how to use git so that they don't have to pick a child when they bisect.


My point on my now flagged and dead?? post was that people think git history is linear, so they believe they have to do special stuff so it's a 'good' linear history when this is pointless.


Git commit crafting (and rebase to achieve it) is overrated. If you care about crafting beautiful series of commits so that the future readers understands what's going on: don't. Context is more useful to find out why something changed. Example:

- you build feature F that is touching N files and M lines of code

- you craft your git commits so that each of them is atomic and "understandable" on its own

- now if I want to see the context of change for line X, I won't get the changes in all the N files... I'll get your crafted commit that updates only line X (worste case) or its immediate surroundings (best case). But how do I get the full context (i.e., this line is about feature F)? I have to git blame lines around line X and see if I can actually make a picture of the bigger feature. Examples of feature F: a whole new http endpoint touching controllers, services, data access models, middleware. What do you get when people craft their commits? 1 commit for the controller, 1 commit for the service, 1 commit for the data access model, etc. Zero context. Time wasted for the reader


> commit crafting is overrated.

For you, in your use case. When I look at my neat history and use git bisect, I get plenty of value out of it.

People keep telling me to stop rebasing. I keep ignoring them; nothing new here.


This is basically solved by squashing each PR on merge and having a good PR title + description.


This is my habit and I advocate for it on new teams. It's so simple and effective. Why make it complicated?


This is exactly what we do at my job. The only time it ever causes issues with some things is when we're ramping up a new project and the PR and ticket titles don't necessarily match the work we've done. But that's more of a problem where we could make everything bite-sized, or we can deliver complete packages in terms of a feature plus all of the miscellaneous changes that we needed to make along the way to make it work. Now, is this the best way to do things? No, you should ideally make those changes separately and push each one separately. But that slows down progress.


But why not do both? I find that nice commit messages make it easier on the review to see whats going on rather than a bunch of wip commits. Then squash all of the commits into a single commit with a nice title and message. The benefits of the squash at the end are, assuming you require tests pass before merge, you have a history of commits all with passing builds which makes bisect possible.


Wouldn't the cleanup work be lost in the squash?


why would you squash multiple informative commits with their commit messages and their informative context about your thought process into a single big squashed commit? git can merge branches and keep track of it, there is no need to squash unless you split trivial changes that belong together and want to group them before sharing them


> no need to squash unless you split trivial changes that belong together

If a PR represents a distinct product feature/bug fix/unit of work, one could argue that its commits belong together.

Squashing makes for a very clean history. True it reduces granularity, but in my opinion at least, it's a good level of granularity.

The exception to this would be a really big PR. That's a reason to avoid huge PRs.


If you keep merge commits you can get the full diff at once and see all the context you need, if you don't you can still write meaningful commit messages that identify the feature you're working on so that in the future you can still do a diff between the first and the last commit and see it all at once.


Yeah if you just use commits as they are intended then the complexity of git drops off massively. I’ve used got for a decade and want to know how many times I’ve rebased? Zero


I am deeply suspicious of people who claim to have never rebased. I've never met such a person who didn't have absolutely disgusting commits: either (1) no commit body to explain the context of why a chance was made, for patches that clearly needed one, (2) random merge commits in the middle of the actual commits because they don't know any workflow besides merging master into their ongoing dev branch, or (3) massive pull requests (if using a pull request workflow) that touch 30 files and implement way more than just the feature the branch was supposed to be for.

It's also a big sign to me that the person probably rarely if ever actually spelunks through the git log to understand why a certain change was made. Because if one does, they very quickly discover why a workflow involving sane rebasing onto origin/master produces such a better history.

There's times when merge commits are appropriate, but they are few and far between compared to the endless abuses of merge commits by people who don't understand git in the slightest (my definition of "don't understand git" is someone who can't visualize the DAG)


Isn't commits as they are intended a patch queue in email with a cover letter (0/n commit)? At least, that seemed to be how the Linux kernel works.


I tend to stage everything I do in my local branches, experiments, hacks, partial changes I don't have time to finish right now, unfinished refactorings. No way I share that without rebasing it and cleaning it up first.


10 years of typos in commit messages


I think you are arguing against an extreme version of the practice.

Normal advice is to shoot for 100-200 lines in a commit, not split that into ten commits that each change 15 lines.

If someone is splitting 100-line commits into 10-line commits, I would advise not doing that.

However the direction that people normally err (IME) is submitting 500-2000-line commits which conflate multiple atomic changes. I would also advise not doing that.

It’s easier to review a sweet-spot commit, and it’s easier to understand when you later have to debug/bisect a change set.

Another related concept is to try to split “functional no-op” refactors from behavioral changes. This is usually the first and easiest way of getting your commit size down, as refactors often bloat the diff.

In your example case I’d hope you have UTs exercising each chunk that is added. There are sensible APIs to shape at granularities smaller than the whole feature (model, service, etc) (or should be at least). If you really can’t add a new endpoint without a 1k- line PR I think you might need a new abstraction layer. But often you can add a new endpoint in meaningful chunks that are feature-flagged off, if you craft it thoughtfully.


I don't think lines of code is much better an estimate of commit complexity than it is of productivity. I could make a 1,000 line change by changing the signature of something that gets called a lot, but that's essentially atomic. I could make a 200 line change by sightly updating 100 totally unrelated functions and that's probably totally nuts.


Sure, it's just a rule of thumb, not a hard requirement. In my experience most functional/behavioral commits benefit from being in the 100-250 line range, and it's more common for engineers to make things too big rather than too small. YMMV.

As you say, non-functional refactors can be bigger while still being readable/comprehensible (but I think it's important to keep them scoped as such, and commit crafting is important for this).


Agreed. The first and last place I look at when doing a git blame is the PR that the commit was in. That contains all the useful information for me, as well as much-needed context around review comments, discussion, etc that is not able to find in native git.


Pull requests vanish when repos change hands.

If you leave unique information in PRs, that information may be lost in the future. This has happened to me at 3 different companies now, where we inherited another company's code base.

Keeping commits self-contained is the only way to future proof your explanations.


It would be great if GitHub (not sure about GitLab, Gitea, etc) included the title and the description of the PR (maybe up to a certain character limit) in the merge commit when you merge the PR. This would be at least a minimal level of backup for the PRs but it would also prevent common scenario for me where I need to switch to the browser from `git log` just to remind what the heck this one PR was about again.


Also PRs belong to the review and CI system, everything needed to understand the code and its history should be staged in version control


I agree and this sounds super neat but in reality it's pretty complicated to keep that boundary and not leak info from one side to the other. I mean, when you open a PR you will probably describe the rationale for the change and why you had to touch files A,B & C (if it's not obvious). You can and you should replicate that info in a multi-line commit message but sometimes through code review you reshape the code structure or even some of the requirements (most probably because there was some initial misunderstanding). So, since we are humans, it's normal to take shortcuts when we did already "a good enough job" and leave out those small details.

Anyway I agree that we should aim at this: PRs are in the CI/review land, why a commit is like it is, should go into VC.


IMO one should maintain a CHANGES.md file with whatever you would be putting in the PR description in it. There's no specific need to squash commits to do this as long as you create it just before you create the PR. Even the odd bugfix or review comment after that is no big deal.


I don't see the advantage of doing this over ensuring the commits stay atomic, and I see several disadvantages.

Namely, git conflicts will happen constantly; and the file size will graduate from unwieldy to unusable over time. One repo I work on has >45,000 PRs merged. Good luck even opening that file.

This approach also poses risk: people will forget to add to it.


If a file is too big you rename it to changes.2022.md. People forget everything but you do reviews to check that presumably. with merge conflicts - if those become a problem for your project you can be more sophisticated. In one case we put all our unreleased change descriptions into a directory and generate CHANGES.md as part of the release. For any PR merge however, you can see a detailed description of what was merged.


For a nice record of history - rather than squashing all of the smaller commits into a single commit for a given feature, and then rebasing (which makes it less easy to have small commits with specific explanations that add together to create a feature), something like semantic-release [1] could be used, which autogenerates release notes based on commit messages.

[1] https://github.com/semantic-release/semantic-release


But I've most often seen rebase used, as in OP, to squash multiple commits into ONE commit that covers the feature that was added.

I feel like the outcome of rebase used there is to give you what you are asking for, and without rebase you wouldn't be getting it?

(I have very mixed feelings toward rebase, myself)


Why does this matter? I can just diff from the first commit before to some point if I really want to see it all wrapped up.

It seems like lots of work that almost always does not matter.

I don’t think git is a good project management system. If I want easy to read stuff I can look at the release notes or tag notes or whatnot.


How is push origin HEAD --force-with-lease different from normal git push? Can someone please explain this to me?


It will overwrite the tree on remote as long as remote hasn't changed since you last fetched it. Like --force, but can help to prevent overwriting other people's changes when they push in between you fetching. It doesn't always work, particularly if you have a tool which continuously fetches remote, like an IDE configured to do so such as VSCode. In that case, you will have fetched the other person's changes, and --force-with-lease will happily blow-away anything on remote that might not be in your tree yet.


Oof, thanks for that warning. So it will blow away changes you haven’t merged (only fetched)?

I guess git can’t tell the he difference between “not merged yet” and “don’t want to merge, please destroy”


If you don't want to blow away changes you haven't merged, you shouldn't be doing force at all?


Well, the article presents a seemingly nice workflow that lets you handle local rebases. I was considering it.

However, we use vscode, and I rather like auto fetch. But apparently this workflow would destroy something auto fetched that I might not have even noticed.


The changes still exist in the repo’s reflog (for 30 days by default), but they might not be reachable from the ref’s new tip commit.


When you work with a rebase-oriented workflow, it's very common to submit a PR for review and then address incoming review comments as fixup commits: https://blog.sebastian-daschner.com/entries/git-commit-fixup...

This necessitates force-pushing to your feature branch after all the fixup commits have been approved and then squashed. At that point you can merge the cleaned up feature branch in to your develop or trunk.

`--force-with-lease` is slightly better than `--force` because in the event that you're also working on a collaborative feature branch you won't overwrite any commits that somebody else pushed up that you haven't fetched yet.


This is the kind of workflow that makes me headdesk. Those fixup commits weren't obvious, and will be useful context if a future maintainer ever has to go back to these - they shouldn't be squashed.


It's not really, but there are a couple minor differences in practice.

First, it assumes the name of your upstream repository is "origin" which is fine in most cases as that is what `git clone` defaults to naming the remote.

Second, using `HEAD` always pushes your current checked our branch to your remote with a remote branch of the same name. It's a neat trick to skip the "`git push` -> you need to set your upstream branch -> push command the git outputs" loop.

Lastly, `--force-with-lease` is a safer version of `--force` or `-f` because it tries to ensure that you don't overwrite history accidentally. The `--force-with-lease` flag will fail if a coworker for example pushed a commit to your branch that you didn't know about. Where a regular `--force` would just overwrite that change.

I assume that the command is meant to be "safe" and "foolproof", in that this command should always work.

Since the parent post recommends a lot of rebasing in their other commands, you'll need a `--force` or `--force-with-lease` to push new commits because they won't be "fast forwardable"


If git ever got a redesign from scratch, they should make -f be --force-with-lease, and make -F be --force[-with-prejudice].


A normal push will only advance the remote branch if your local branch has the tip of the remote branch as an ancestor. This ensures you're only adding commits to the remote branch and not replacing any commits on the remote.

But if you've pulled the remote branch, rebased, and now wish to push, the tip of the remote branch is longer an ancestor of your local branch. In this scenario, to update the remote branch you have to do a force push.

But now imaging that another developer has added new commits to the remote branch in the mean time. If you do a force push, you will overwrite those new commits.

Using "--force-with-lease" ensures that the tip of the remote branch hasn't changed since you last pulled it, so that your force push isn't erasing any commits on the remote by accident (i.e. it ensures the remote branch has not changed in between when you last pulled it and your current force push.)


Force with lease is like force push except if the branch you are force pushing to has changed since you last fetched then it will cancel the force push.


I'm reminded of the saying, "if you make something idiot proof, someone will make a better idiot."


As I get older, I become the next dumber idiot :D


This is misleading. There are all sorts of "idiots" on a bell curve already and making something easier simply widens the stripe of effectiveness. It is a good thing because you now have more "idiots" who could do the job just fine.


I think the implication is often efforts towards "idiot proofing" don't actually improve effectiveness at the end of the day. They simply add something else to trip on.

I'd roughly say that's the case with OPs aliases. Useful if you know what you're doing, or maybe would work with a strict team policy. Otherwise are just going to create more trouble (especially with that force pushing bit in there...).

ETA: I fully support usability! "real programmers just read the manual" arguments be damned.


The paradoxical thing about Git is that it saves all history so you can undo anything, yet at the meta level this is quite hard and you can still get in a lot of problems if you are not careful.


Pretty cool idea.

Something I really want is a version of `git amend` that takes a commit hash, so I can amend my changes to any commit, not just the last one without having to start an interactive rebase.


You could create an alias doing something like a "git commit --fixup <sha1>" and then a "git rebase --autosquash <sha1>~".


I mentioned elsewhere, but I made a simple bash function that does something like that but matches a string in a previous commit message (because I found that easier to type quickly than a commit hash):

  function git-commit-fixup() {
    git commit --fixup ":/$*"
  }
  # usage: suppose there's a commit "fix: the thing"
  git-commit-fixup thing
  # now there's a new commit "fixup! fix: the thing" which can be autosquashed


Git's interface is like the 'find' UNIX command: it makes complex things possible, but does nothing to make common ones easy. 99% of times I want to find a file in the current directory or below:

  find . -name foo -print
Can you spot the 3 obvious arguments that find shouldn't require me to type?

Similarly, if so many users add -a to their git commands, why is this relegated to such an ugly flag?


TBF (at least with GNU find) you don't have to:

    find -name foo


Genuine question. What makes you feel that -a is an ugly flag?


Flags are modifiers to alter the standard behavior of a command. If you use it 99% of the times, then I'll argue that the definition of "standard" is wrong.


Ya, the three arguments are

1. `|`

2. `grep`

3. `"my actual filter"`


Rebase (non-local rebase) based workflows are just awful. The best git methodology I've seen is (surprise, surprise) in the Linux kernel. I can keep everyones tree as a separate tracking branch and use git work trees to develop against multiple branches on the same system. I also really enjoy using git request-pull, send-email, etc. and I think Github has been actively harmful in teaching people bad git habits.


Rebasing remote feature branches after feedback in the github-pull-request workflow is analogous to re-sending and updated patch series in the email workflow.

The distinction shouldn't be if its remote or not but if its something "published" that others might build their work upon.


any pointers on where to start for request-pull, send-email, etc. Ie, the linux kernel flow?

Everything is a PR these days in github, so perhaps we find a way to do this with github.


Sourcehut's tutorial on send-email is pretty good: https://git-send-email.io/


A really good guide to learning this is https://git-send-email.io/.


I love git, but I have to be honest about one thing: It's definitely hard on purpose and introduces hard on purpose features (and naming conventions) to weed out people.

rebasing and squashing and tracked/untracked. Yes they all have a place, but has anyone noticed with SVN it's just commit, merge and branches.

Would it be impossible to make git with svn's UI?


Yes, svn hides and pushes most of the complexity to the users workspace, and just gives up when any sort of complexity happens.

Git could clearly be improved, but much of sun’s ease is due to hiding the complexity of development.


I think people mostly don't care, but I personally sign my commits.

And if you rebase, you lose the signature (well, maybe the person rebasing re-signs it). But then it means that my commits are not signed by me anymore.

Feels like an interesting point against rebase workflows to me, but nobody mentioned it, so maybe it's just me.


Ugh, that `pr` alias gives me shivers. Just write a standalone script that you distribute to teammates! You don't get extra points for jamming it all into a single string in a config file.


This can be fine in two scenarios: - solo dev - enforcing its use in your company

Otherwise big messes will inevitably arise. In the second case, new people with their (correct) git workflow will suffer.


This is off topic:

I have a dumb git question and I can never seem to formulate a google search that will help me.

I use a Mac and for some reason I am able to use `head` (lowercase) instead of `HEAD` (uppercase) in every command and its trained in my muscle memory.

So when I go to another computer, this shortcut isn't there, so when I type `git reset --hard head^` I get an error, and I have to go back and change it to `git reset --hard HEAD^`.

Anyone know of a configuration option or something somewhere that I can enable this?


> Anyone know of a configuration option or something somewhere that I can enable this?

I'd suggest that you retrain yourself instead. "HEAD" and "head" are not the same thing, and any fakeout configuration to change that will also be nonstandard and not available everywhere.

The underlying issue is that HEAD is the label Git uses for the reference to the top of the repo. It's saved in the filesystem as .git/HEAD.

MacOS filesystems are case-preserving by default. Linux/POSIX filesystems are case-sensitive by default. I consider this a bad default setting in macOS. Try "cp FILENAME filename" sometime. :(

Anyway, consequently, on default macOS, "head" will be remapped to "HEAD", if "head" does not exist. Watch out for "Head" and "hEaD" though. Of course those won't happen in normal Git usage (though they could be valid, and different, tag names!).

My suggestion is to not let bad macOS defaults creep into your habits. And to not make things even more weird by trying to reproduce their bad behaviour in non-macOS environments.


This is probably due to the underlying filesystem being case insensitive on Mac. As a hack, you can create a symlink alias on a per repo basis.

ln -s HEAD .git/head


Not to worry, I am a professional googler

https://stackoverflow.com/questions/25976794/is-head-in-git-...

...I keep finding out neat new stuff about git plumbing.


MacOS defaults to a case insensitive filesystem. Branches are just a commit tag which is just a git object which is a file??? I think? Everything is a file?


With https://www.highflux.io/ we try to make this git workflow even easier by automating the necessary steps with a simple UI.

You can just work in your feature branch, your changes get automatically rebased and when your done your branch gets merged as a single commit.

We are still in early development, always looking for feedback :D


I find most attempts to hide how git really works with alternate commands just end up biting when something unexpected happens.


After reading this post, I have another tip - instead of adding `alias.pr` to your gitconfig, you can create a shell script called `git-pr`. Then you can write readable code instead of having to sprinkle extra quotes and backslashes everywhere.


Why use risky rebase while you can just squash all into one commit and push it via intermediate branch (something that Github does as an option)? I prefer to be worry free than always be ready for someone screwing up a branch.


To be fair, squash is a form of rebase.


Why do you feel rebase is risky?


I just never rebase. Am I missing out? It seems the only major advantage is to reduce the number of items in the history and that doesn’t seem very important to me.


Yes, you're missing out. Rebasing isn't about reducing the number of items in history, although it can do that too. It's about having a sane, readable history.

If you write two commits in your local feature branch, then pull master in and generate an ugly merge commit, and then stack two more commits on top to your local feature branch, and then finally get all that mess merged to master, you have a difficult to read history because of the merge commit randomly in the middle. Whereas if you instead rebase onto origin/master (after fetching of course), you get a nice history where your feature branch commits are cleanly on top of origin/master, so there's no crazy merge commits in the middle.

Most people just use git as a fancy save button, and therefore never actually use the git log / git blame to answer questions about why some part of the codebase is the way it is, and therefore they never realize why their merge commit insanity is so destructive to the usability of the git log.


For a 1 man team there is no point. Even for a 1 man team with a local and remote repo there is no point, because your local branch is always the same as the remote branch.

For a 2+ man team that does not protect the remote repo (i.e., don't require merge/pull requests, and you can commit/push directly into remote branches), then there is no point because... When you go to push a branch and remote complains about changes existing on remote, so you must pull first. You issue a pull and that does a fetch/merge into your local (commits are overlaid in order by date committed - some people complain about this and is why they rebase (commits from remote are back filled, and then your commits are inserted)).

For a 2+ man team that does protect the remote repo, and does require merge/pull requests then an explicit rebase or merge is needed, sometimes. At the end of the day you should be creating a merge/pull request from a source branch that has all of the changes as the remote target (at that point in time) to help the reviewer/approval only see your changes. If you updated your local branch via a rebase then the commits related to your changes are all in order, if you updated your local branch via a merge then the commits related to your changes are interlaid between other commits that happened around the same date). But really that's only a problem if someone cares to use git-log and not the 100 other ways to review history.


No, your not missing out… I mean, it’s a powerful tool, and like any tool, learning to use them can be helpful. Having said that, rebase is a tool that I find myself compelled to use more often than I feel it’s actually useful to use.


This doesn't make git any easier. The synced and update commands are redundant. If you have

    [pull]
      rebase = true
in your .gitconfig, which you should, both are the equivalent of a simple git pull. Naming a command `squash` when it does interactive rebase is also pretty confusing.


Missed opportunity to title this "Git Proof Git"


Nothing is idiot proof. Only more or less idiot resistant.


So, like easy git (egit), but even more minimalistic?


Nothing is idiot proof, as people greatly underestimate the destructive power of a motivated idiot. =)


but we have such inventive idiots ...


Just learn Git. It's really not that bad. Don't _ever_ make someone use Rebase without their understanding of what that really means.


Git like Regular Expressions I've "just learned" half a dozen times already but then forget everything and have to "just learn Git" top-to-bottom again and again.

It really is a shame, Git has a great under-the-hood design (excluding poor binary file support), and such a terrible interface/UX that seemingly can never be outright replaced, so we're stuck with a good tool surrounded by needless confusion forever.


My experience is that once you understand the (high level) data structure you're manipulating by using git, you have already made 80% of the way there. I barely know basics of git's interface and have to use tools like `git gui` for anything more complex in my everyday workflow, yet I can manage just fine even when presented with complicated problems, since once I know from the data structure perspective what I want to do, finding a way to do that in git's interface by searching manpages and/or the Web is hardly ever a big issue.


The way to get „git“ is to think about version management as a concept i.e. starting top down. It’s the implementation details of git that many developers get confused about. But they don’t matter.

It’s similar to recursions: You don’t try to understand a recursive function by mentally evaluating it — instead you reflect on its desired return value and how to construct it in the base case.


It is worth the investment to learn advanced Git if you're a software engineer. Start by reading the Chacon book cover to cover. It is a major tool you use every work day, so it's important to understand it.


[deleted]


Doesn't happen ime. I understand how it works (more or less) and still struggle with its CLI. It was created by aliens with photographic memory and non-associative thinking.

Maybe you can point to a git test/quiz that doesn't ask stupid trivialities 9 times out of 10? In case I have to "understand" it more, would be nice to pass some informal certification at least.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: