Hacker News new | past | comments | ask | show | jobs | submit login

The approach that has worked best for me is pretty simple.

Master should always be in a deployable state.

Any commit that lands on master must include unit tests that demonstrate that it works (or at least doesn't break anything).

Incomplete features that will take more than a week or so to develop should be merged to master early protected by feature flags. This avoids long running branches.

I like squash commits: build a feature iteratively in a branch + pull requests, squash to master once it has the tests and documentation bundled together with the implementation. This gives you linear history.

People who prefer to review smaller commits can do so in the pull request - it's only the finished item that gets squashed into a larger commit.




The fact is that this doesn't work for everyone. What we found to be most workable is master is always "stable", then we have a production branch that is cut from master.

This gives you a few things:

1. More predictable deployments (let's deploy every week with X items)

2. Better rollbacks (rollback to last production deploy instead of guessing SHA1 of last acceptable state)


This is what I use tags for.

Master = deployable state (stable)

Every deploy gets a tag (named with its version). Rollback to previous tag in event of failure. Deployments remain just as predictable.


I've also seen people have a single development branch and multiple release branches (imagine a 'devel' branch that everyone merges into and a branch for each release that you have, and that potentially merges back into devel if you do hot patches). It's essentially the same thing -- branches aren't really all that different than tags in git.


Yeah its basically aesthetics. I do like keeping the release identifier separate from branches though just for categorization purposes (branches = development, tags = releases)


The method you describe is functionally the same and only nominally different.

It could be said:

> What we found to be most workable is develop is always "stable", then we deploy from master.


we deploy to test multiple times a day from out master branch after a Pull request is reviewed & merged. the deployment bundle (zip of the state of the code) that goes to test later goes to staging and production is the same across environments.

The only thing that changes is the configuration variables which are replaced at each env. This ensures predictable deployments and if we want to rollback we just pick the last release bundle and re-deploy that.


My snarky version of this is: ideally head of master is deployable (no regressions, yadda yadda). In reality, how many times has head master of master been broken and one had to rollback, and then had to scramble to find the rollback?

Gitflow is fine - better to use it (w/the girl extension) than spend time arguing branches.


Having to frequently rollback changes can be argued to mean that you don’t do enough testing before merging but are testing in production.


I agree with this approach. It worked beautifully for me at my last job. Keeping master ready to deploy is the way to go, along with squashed commits for a linear history. A side-effect of this approach is that it makes it super easy to bisect your way should a regression occur; something that I've found is impossible with GitFlow.


We use this work flow and it works masterfully. I have no need to save all my small commit - partly because I commit often to run tests in parallel on a CI server.

Does anyone know of a layer that you can use on top of git to make the occasional squash and rebase easier?

I know gitless is a helpful layer on top of git but not sure it support this specifically.


I mean, GitHub allows you to squash merge a PR, which is what we do at work. I find it really simple to summarize everything in the PR in that squash commit that ends up on master. I only rebase/squash locally if I need to make it easier for review


Can you still easily bisect to find which change broke a certain feature? I usually recommend against squashing because it's easier to identify an error in a 10-line diff than in a 2000-line diff.


We use the same method and yes, it's been usually easy enough to bisect from squashed commits. In mobile development it even helps, as bisecting effort is dominated by rebuilding time -- less commits makes for much faster bisecting.


The problem is more the state of the rebased commits. If it is a dirty history, bisect will break for a myriad of random reasons (broken tests, incomplete features etc.). So if you don't squash, and want to run bisect, you _have_ to ensure that all commits are clean and working. Easier said that done.


> So if you don't squash, and want to run bisect, you _have_ to ensure that all commits are clean and working. Easier said that done.

It can be done if the commits are separate logical units of work rather than work in progress commits. One approach I take is to take the diff between your branch and master, make a new branch based off of master and stage parts of that diff (parts that make a logical unit of work) and make a commit out of it. You can do that with commands like git apply with the relevant diff hunks.

Then, to test whether each commit works, you can run git rebase --exec "your-test-commands" master and it will apply each commit in your branch and run the test commands. If they fail, then you can fix the issue in that commit and run git rebase --continue to move onto the next commit.

This way, you can get a logical set of commits where each of them are in a working state.


Can the PR still show all the individual commits when you come back to it? If so, are they still in the repo with some keepalive tag or are they persisted outside of the repo along with the PR-comments/metadata?


Yes, GitHub keeps the separate commits visible in the closed PRs. For what it's worth, I never ever found those useful anyway and worked just with the squashed commits. It's up to the team to make sure the squashed commits don't grow to thousands of lines in size.


> Keeping master ready to deploy is the way to go

Are there really teams that don't do this?


Any software that has more than a few dozen developers and has tests that run for more than a few minutes necessarily will have broken master states at least occasionally. Otherwise you slow everybody down because merging to master can only happen (working hours/test run time) times per day.


And not all projects produce a single output from their source code.

I've worked on multiple projects over the years that had mostly a common code base but had to build and release to support different hardware platforms. In some cases, these needed significant but permanent divergence between the code for one platform and another.

In this sort of environment, concepts like "master branch" and "continuous deployment" have no meaning, and neither does a principle like always being ready to deploy. Your whole approach to things like branches, merging, rebasing, cherry-picking, bug tracking, and what a version number means is probably different to, say, deploying a long-running web application with frequent updates and all users running the same code.


If you're using gitflow, you've got a separate develop branch that everybody merges to, while master remains stable. Only when develop works and passes its tests, can it be merged to master.

And with multiple teams sharing the same code base, breaking master is going to slow a lot of people down a lot more.

I have little experience with multiple teams working on a single code base, but it sounds like a recipe for disaster no matter how you organise that. Make your code as modular as possible, so each team can take responsibility for their own code.


Openstack has the issue of long tests (nova takes two hours: https://review.opendev.org/#/c/711277/ ), but they still keep a green master. I don't think it's a problem in itself. The question is - why is anyone slowed down by merging? Why are people waiting for that action to happen?

Merging can happen more than the (simple calculation) times a day though. What openstack does is check each branch on its own, but then tries to merge all a number of waiting PRs at one time. If they pass on fake master again, that's merged into real master.


Github's interface is a big part of the issue (only for those using Github, of course).

Say you branch off of master, make some changes, and make a PR. Call this branch A.

Then you want to make a new branch, that depends on branch A. Call this branch B. The PR for branch A is still waiting to merge when you finish with branch B.

If you make a PR to merge branch B to master, you get a diff showing both PRs worth of changes. That's annoying to review.

If you make a PR to merge branch B to branch A, you get the correct diff for B. Huzzah! But if branch A's PR then merges, Github deletes the branch B PR and any comments/review. You have to remember to retarget branch B PR to master, then merge branch A.


It is interesting how in such a case there is an increasing blur between git and blockchain.


I work on an embedded device. Our tests include power analysis testing in LTE/GPS simulators of all the sleep/wake states. Total testing of a release takes about two weeks.

Trying to keep master consistent without git-flow means we get PRs sitting around for a month and merge-conflict hell days when we merge everything to make a build to send to QA.

Git-flow is bad if you have CD. Git-flow is great if you can't do that.


The part that's most contentious about git-flow is release branches.

You still should merge often to develop, which when tested becomes master.


The issue there is that you typically find some issues in testing, and need to fix them. That's why you test. You can either have a release branch and a develop branch, cherry-picking/merging fixes into the release branch as well as develop, or you can just hold off on merging any "not-for-this-release" branches for a few weeks. Just let the PR sit there. Alone. In the dark.


Amazon FreeRTOS.

Master is a working branch. Releases are pushed to Release and tagged with a date.


Master should always be in a deployable state.

Isn't that what the develop branch in Git Flow is for? Your project might need some additional steps as part of its release process, and those can take place using the corresponding release branch, but the idea is that you can start a new release off develop whenever you want.

I like squash commits: build a feature iteratively in a branch + pull requests, squash to master once it has the tests and documentation bundled together with the implementation. This gives you linear history.

Indeed. We're talking about using Git. Even if you're using a Git Flow style of workflow, why wouldn't you regularly rebase any feature branches back onto develop if they're lasting more than a short time, so you are keeping up-to-date, and why wouldn't you squash the finished feature down to a single commit that can be fast forwarded onto develop when it's ready, to maintain a linear history with one commit for one significant feature or fix?

The only time I've seen non-FF merges in a Git Flow style project is for bug fixes on release branches, but those tend to be characterised by two key properties: they are relatively simple and relatively important. If it's not simple, you probably need to abort that release and go back to your main development line to do whatever more substantial work is required before continuing, then make a new release. If it's not important, you probably ship that release with the known bug, and the fix gets made on the "develop" branch via the usual process for incorporation in future releases. So the effort involved in merging bug fixes from release branches back to develop is typically both relatively small and well justified, and if you do need any manual conflict resolution, that implies divergence since the point when the release branch was started and then it probably is a good idea for someone to take a more careful look and make sure the bug is properly fixed on develop as well.


> Isn't that what the develop branch in Git Flow is for? Your project might need some additional steps as part of its release process, and those can take place using the corresponding release branch, but the idea is that you can start a new release off develop whenever you want.

In gitflow, “develop” is master. The master branch serves no real purpose.

The intent of “master is always deplorable” is really that every commit, on every branch is deployable.

To me, this is the whole point of CI. Commits need to be restructured in such a way that everything works incrementally, otherwise you’re asking for trouble — how do you even bisect problems?


> In gitflow, “develop” is master. The master branch serves no real purpose.

Yes, it serves a real purpose.

Develop is technically deployable (in CI/CD/CD terms it is delivered but not deployed), Master is that plus socially deployable, and hence actually deployed.

Now, some organizations don't have social controls on deployment that have timelines that necessitate a separation between those things plus release branches to support getting the technically deployable to be socially deployable, but some do, and those are often outside the control of the development organization, potentially the IT organization, and sometimes even the whole organization as they may be external controls.


I'm a fan of "Master is production" , so as soon as a commit is merged into Master it's deployed immediately. That way everyone always has access to what exactly is in production and helps enforce the mindset that anything merged into Master needs to be 100% ready to go.


Master should be the known-good version. Not the version you're deploying right now, but the version you deployed yesterday from a release branch and nobody screamed. That makes it much less likely that your teammates will rebase onto a bad commit that needs to be rolled back in prod.


Some organisations don't have this kind of deployment at all. Most software is not web apps running on servers under the control of the development team with a single version in production at any given time.


Not far from my experience. I advocate preserving history, and wisdom of requiring unit tests for everything depends on project properties.

I think this is a good treatment about why rebasing is bad: https://medium.com/@fredrikmorken/why-you-should-stop-using-...


> I like squash commits: build a feature iteratively in a branch + pull requests, squash to master once it has the tests and documentation bundled together with the implementation. This gives you linear history

I disagree with squashing commits when merging to master, here's why:

Your commit history is what you will use later down the line (think 6 months, a year, 2 years later) to find a very nasty yet incredibly subtle bug in your application.

Imagine said bug was introduced by a feature that took about 1 full month to code, introducing 1000 lines and removing 500.

That feature was probably made of many small commits (say, 25 commits total), on a feature branch called "super-feature".

If your developer cared at all about making "atomic" meaningful commits, each one of those commit will be introducing a small very well defined subset of the entire "super-feature".

The "super-feature" will in the end be the result of all of those commits together.

Squashing that history of 25 commits, will create one giant commit of 1500 changes. Have fun finding where in those 1500 changes that little nasty bug was introduced!

Instead, if you hadn't squashed, you simply use `git bisect` and find in no time that the bug was introduced by the 12th commit of that branch, which itself was a small commit of 1 little change:

  diff --git a/statistics.php b/statistics.php
  index 69e435aa9f..46666daa93 100644
  --- a/statistics.php
  +++ b/statistics.php
  @@ -8,7 +8,7 @@

  - a = b + c
  + a = b - c

If you want a very linear history, enforce a:

  git rebase master super-feature
  git merge super-feature --no-ff
This will rebase your branch on top of master, but, when merging, the `--no-ff` option will tell git to create a merge commit. This merge commit will make it very clear that the code branched off and was later merged back into master, yet, it will keep the history linear.

=======

TL;DR: Your commit history is as important as your code! Don't squash it away! I wish github never introduced that feature and instead developers were taught how to create a meaningful history of commits.

=======


> Squashing that history of 25 commits, will create one giant

> commit of 1500 changes. Have fun finding where in those

> 1500 changes that little nasty bug was introduced!

100% agree, the only purpose of this is to make "pretty graphs" at the cost of destroying the history of the code changes.

I prefer to write small commits with simple messages that explain exactly what that change does. Sometimes it costs some time in Git Gui selecting the lines that make up the commit, but ultimately it's worth it.


I notice that some people seem to care more about pretty history than about useful history. I'm not a big fan of rebasing for this very reason: it changes history, which can misinform about what was really done. It can introduce bugs in commits that didn't initially contain them.

Though I will say that git has one big glaring problem with merge commits: they are effectively black boxes, and when you've have a big merge conflict, many git tools don't tell you how that was resolved in that merge. I recently came across a merge that wiped out some of my changes with no clear reason why. I redid that merge and there was no merge conflict at all, so it's still a mystery what happened there.

Still, rebasing has caused far more problems in my experience. It's only suitable for trivial, local cases and never on commits that have already been shared (pushed).


I think a lot of the fear of rebasing comes from not fully grokking what it does, and not rebasing often enough.

My workflow typically involves rebasing my local branches on master every time I switch a branch or start a work session, so typically conflicts that need to be resolved are very clear and small things (e.g. someone else adds a case to a switch statement in the same place your branch does).

The biggest benefit to this that I see is that your diffs are then much more clear in what they're doing, and you have the option to update the commit message to reflect the conflict resolution if required.

That said, if you have a big feature branch and try to rebase it after a month vacation, you're gonna have a bad time.


Daily local rebase on master is really the only acceptable use I know for rebasing. It's not just after a month of vacation, but simply when your local branch has a lot of commits, that rebasing can become very painful, as you may have to resolve the same conflict over and over and over again. That must be the circle of hell where programmers end up.

Basically you'd have to rebase after every single commit. That would keep it the most painless, I guess. But I can also just do a single merge and I'm done.


Git-rerere and smart merge tool like p4merge help a lot with this rebasing problem.

These are essential if you work with forced rebase mode like in Gerrit. (ugh)

You still have to be moderately careful and read the code again.


> Though I will say that git has one big glaring problem

> with merge commits: they are effectively black boxes, and

> when you've have a big merge conflict, many git tools

> don't tell you how that was resolved in that merge. I

> recently came across a merge that wiped out some of my

> changes with no clear reason why. I redid that merge and

> there was no merge conflict at all, so it's still a

> mystery what happened there.

Hmm, I don't think a merge magically removes the need for testing. If I build a feature A that replaces feature B, and another programmer builds a feature C that relies on some aspect of feature B, when merging, although it will merge and compile fine, it's possible it is still broken.

This is a good argument for well defined interfaces, but it's possible to see how a successful merge may not produce a successful source.

As for things disappearing, I think there is some edge case where it looks like one branch deleted something after you made some change to it. Generally it's best to not have people working on exactly the same part of code. On the plus side, you have a full working history, therefore can recover :)


There's more to it than just knowing it's broken. I want to know why and how it happened. Of stuff needs to be tested after a merge; that's how we discovered there was something wrong in the first place. But I don't want to redo the work that I already did; I want the merge to be correct. In this case, it seems the merge did something it should never have done, and it was not clear why this happened.

Still, the commits that were merged both still existed, to it was easy to redo the merge. I shudder to think what would have happened if this had been a rebase; then the original commit might have been gone.


> I prefer to write small commits with simple messages that explain exactly what that change does.

One thing people should include in commit messages is why the change is being made. In other words, what's the purpose of the change. That can provide a lot of context that can be lost over time. So my commit messages include both what was done and why it was done. The why part can range from things like this method is needed for this feature that does something, or this fixes a bug that was causing this issue, or this other feature is no longer needed in context of this other reason, etc.


> One thing people should include in commit messages is why

> the change is being made.

In a perfect world, there would be a massive document with each commit, specifying exactly how it behaves, all the possible edge cases considered, why it was implemented, other implementation choices and a justification for why this one was selected, etc, etc.

I think the reality of the matter boils down to a few points:

1. Commits are not read that often and if they are, it's usually recent history. If they are being read very much in the future, then the "why" may not even make sense either.

2. Unbounded limits on commit message lengths leads to some people waffling in their commits. If you can't explain it in 80 characters, you should very much consider breaking it down.

3. Usually it's not possible to isolate a commit from the context of the commits around it.

4. Usually it's not possible to understand a commit without the larger context of the system.

Generally if some implementation requires an explanation, it can be better understood as a comment next to the implementation in question.


> 1. Commits are not read that often and if they are, it's usually recent history. If they are being read very much in the future, then the "why" may not even make sense either.

It depends. If you're looking at the commit log directly, then you would tend to look at recent history, but if you're using something like git blame, you can easily come across commits that are far less recent.

For example, part of your change is to update a line of code in a file. You run git blame and see the commit that introduced that line of code, run git show for that sha1 and see the context around why that line was introduced (e.g., a bug fix). If you don't look at the history, you could easily introduce a regression in the code without realizing it. In other words, the VCS history can be used as a tool to help determine what potential effects a change could have.

> 2. Unbounded limits on commit message lengths leads to some people waffling in their commits. If you can't explain it in 80 characters, you should very much consider breaking it down.

There's no restriction on how long a commit message should be. Convention from the Linux kernel and git itself says the title should be limited to 50 characters and the body should be wrapped at 72 characters (other than things like links and error messages/log lines). There's no restriction on how long a commit message should be.

As for whether to split the commit up, that really depends on whether you're doing multiple things in a single commit, which may or may not have a relation to how long the commit message is. Typically, if you see something like "do A and do B", then that's a sign that the commit should be split into separate ones where one does thing A and the other does thing B.

> Usually it's not possible to isolate a commit from the context of the commits around it.

If you use a combination of git blame and git log, you can get that context. git has a feature that can be used with git log where you can search for whether a string/line was added or removed in a range of commits with the -S flag, or if the number of instances of a string/line changed with the -G flag.

> Usually it's not possible to understand a commit without the larger context of the system.

That's true for people who are new to the project, but using something git blame can help people new to the project find the person/people to ask about why that code was written the way it was.

> Generally if some implementation requires an explanation, it can be better understood as a comment next to the implementation in question.

That largely depends on whether people are diligent about updating the comments when they update the code. If you use git blame and look at the associated commit message for that line of code, it will always be up to date. If you make a change to that line of code, then the new commit message would be associated with that line. In fact, a tell tale sign that a code comment may be out-of-date is if the git blame output shows different commits for the comment and the associated line(s) of code.


What can also help is linking to the related Jira ticket (or whichever issue tracking system you use).

And separate "cleanup" commits from commits that make meaningful changes. A commit that changes every line in a file because it's fixing incorrect indenting is fine. A commit that does that but also introduces a small functional change, makes that change invisible.


We've actually changed tracking systems over the years (meaning that the link in the commit message is no longer valid). So, it's best to include both the link and the actual text.

> A commit that does that but also introduces a small functional change, makes that change invisible.

While it's sometimes possible to separate the actual change from the formatting change by ignoring whitespace if the diff, it's definitely better to make each type of change in a separate commit.


Absolutely. I don't recommend replacing the text with the Jira ticket number. List them both. Both are useful.


Agree with the thoughts on atomic and meaningful commits (though would drop the merge commit entirely), and I will rebase my commit history extensively at times to get it into a presentable state along the lines listed here.

Bottom line: the history, content, and message of each commit, in addition to the resultant repo state, are all artifacts of your work and are worth shaping and crafting in order to optimize for communicating to yourself and your fellow developers in the future.


> though would drop the merge commit entirely

The merge commit is important though. It helps delimiting a set of commits that have meaning all together from the rest of your commit history. Here's an example:

  a -> b -> c -> d -> e -> f -> g -> h -> i -> j -> k 
In the history above, did you notice that commits c through h are actually related to one another? They were introduced a while ago by a feature branch.

Compare the previous history to this:

          c -> d -> e -> f -> g -> h 
  a -> b /                          \ i -> j -> k 
Now that's very clear that c through h are part of the same line of work. All those commits were probably part of 1 single pull request and were reviewed in the context of one another.

My history here is still very linear. Yes, there's 1 little branching off, but that's all.

At my previous company we had a codebase of well over 20k commits entirely made of those little branches branching off master, then merging back in. It made it for an incredibly readable history.

An other added advantage of not removing the merge commit is that you can revert the entire merge of your branch in one go.

Without the merge commit, you need to remove each commit one by one (and that's granted you remember which commit was the first one and which one was the last one).


Agreed your history is still linear. But I would suggest your merge commits are perhaps more trouble than they are worth. In commit sequences we prefix almost all commit subjects with the JIRA ticket ID, so the sequenced tickets are quite clearly marked as being related to each other without incurring the merge commit cost.

As for reverting merge commits, I'd rather not personally have to remember the rules as to how to re-apply after the revert (revert the merge revert, then merge the branch again)[1], let alone working on spreading that knowledge losslessly across the organization. This is admittedly something I haven't done much of, but I'd rather revert ~3-9 individual commits, squash the reversion commits (usually with a bit more context on why we're reverting), and push. And it could be scripted. I haven't made a script for it as it doesn't happen that often, but between using rev-parse and a sed script in the GIT_EDITOR environment variable it would be pretty quick.

Having merge commits also means that you have to teach everyone how to view diffs of merge commits, which admittedly isn't that hard, but it is still just one more barrier to know/teach/overcome, and the default behavior of showing a zero-line diff for a no-op merge commit doesn't help.

[1] https://mirrors.edge.kernel.org/pub/software/scm/git/docs/ho...


> Your commit history is what you will use later down the line (think 6 months, a year, 2 years later) to find a very nasty yet incredibly subtle bug in your application.

No, it won’t. If I didn’t catch it then, I will have to read the code and understand it and spot it.

At work, where my builds take anywhere from 30 minutes to 3 hours depending on the target, your git bisect on 1500 commits represents days of building and testing. Or I could just read the code again.

Commit messages should provide context for going through the process of rereading the code, and little else. If a commit message literally describes what the code does (“change constant from X to Y”), it’s self-describing and unnecessary.


It's okay if you really prefer to read a huge squashed commit instead of an atomic commit in a branch, but you made it sound like squashing was necessary in order to reduce building and testing. That's wrong. You can use `git bisect` on merge commits only, then, once the faulty merge is detected, bisect the commits of the merged branch. Bisecting merges won't take more time than bisecting squashed branches.


But can't you just go back and reopen that branch? Just because you squash-merge the branch to master doesn't mean you can't keep the branch archived. I have seen where you basically just "rename" the branch (i.e.: make a new branch of the branch and just leave it as-is) to "merged-2020-03-03-BRANCHNAME" or something and then you don't touch it. Meanwhile, your master is a neat and tidy history of features added, tickets finished, and bugs fixed without all of the "fixed the thing I forgot to add" messages that inevitably sneak their way into your git log.


I think you are being down voted because people are confused about what you are saying.

In git, a "branch" is just a tag on a particular commit. What we intuitively think of as a branch is the set of commits starting from where we diverged on master and ending up with the commit that is tagged with the branch name. There is nothing in git that actually stores that, though. When you rebase the branch, squashing commits, you create a new commit that is that sum of all the other commits. The other commits are left unreachable by any other branch and will be removed the next time you GC the database.

What you are saying (I think) is to add a new branch called "merged-2020-03-03-BRANCHNAME" that points to the same commit as "BRANCHNAME" and then rebase BRANCHNAME, squashing the commits. In that way the old commits are still reachable and won't be removed.

The downside, of course, is that it's a bit of a PITA if you want to find out what commit changed a line of code and why. You would narrow it down to "BRANCHNAME" and then you have to search for the archived version of it, switch to it and then continue looking at your code. But, in principle it could be done.


Yes, that's basically what I'm saying, but I see now it could be a pain with a big repo, so probably not the greatest idea. I do still like a squashed master branch, but I suppose if you trained your developers well enough (ha-ha) they could just commit good messages, however I know several awesome developers who, from time to time, commit a "fixed the thing again" message here and there.


without all of the "fixed the thing I forgot to add" messages

This is why you use rebase --interactive or other ways of amending commits before submitting a PR. It is not an argument for commit squashing.


> rebase --interactive

like 90% of developers (maybe 95%) don't even know how to rebase from master before they submit a PR, so I doubt, highly, that they'll learn how to do a `git rebase -i` in order to make things cleaner. I've personally never seen a repo that utilizes all these "fancy" git commands (fancy to those developers) to make a clean history, I've always seen the messiest shit. Maybe that's sad for me in my career, though.

I agree with you this is the best technical way, though, after reading some more comments in here, but not the best way for everyday developers.


You have made a case for keeping the work-in-progress commits.

The case the other way is that what should matter at the end of the feature development is the visible contribution it makes to the overall project, i.e., the finished code, tests, etc.

This has two fundamental consequences. Firstly, in this model, your final commit that goes onto whatever you call your primary development branch should be a complete, self-contained piece of work, like adding a new feature or fixing some specific piece of broken behaviour. It should include any related tests, data files, etc. It should be something that can be reviewed in isolation and make sense. It should ideally be something that has minimal interactions with any other code that aren't strictly necessary, so it can be removed or replaced later if necessary with minimal effort to resolve or rewrite anything else that has happened since.

Secondly, requiring all work-in-progress commits to also be meaningful and live forever in your shared repo significantly limits the usefulness of source control for the developer during the development stage. They might be experimenting with different approaches but want to keep track of a version where some specific aspect of the code is currently working. They might be about to refactor to clean some new code up but want to make sure they can't lose the exact version they had just got working due to unintended side effects. They don't want to be getting every individual commit code-reviewed or run any slow testing processes, just in case someone later comes along and cherry-picks that particular point in history to use as the basis for something else.

So I will respectfully disagree with you that the full commit history at this level is as important as the final code. My reason for that is that the final code is complete, tested, reviewed, and going into the finished product, while anything in the work-in-progress commits that didn't survive to that point was determined, by the developer doing all of that work, at the time, with the aid of all of the testing and reviews and tools that were involved, not to be as valuable as the final version.

If there really were interesting alternatives considered during development that might be useful to keep around for later reference, you can always tag or branch at that specific point and give it a meaningful name. However, I would suggest that this is like keeping issues open in your bug tracker when you know you'll never fix them: unless you also have some process for keeping track of such ideas and why they are valuable to refer back to them and take some useful action later, you're better off not having them clutter your main development process and tools forever.


I'm actually not arguing the work-in-progess commits should be kept. On the contrary, I do agree that work-in-progress commits are useless, provide no value and should never be submitted for review in a pull request.

I think I should have been more explicit when saying "your commit history is as important as your code". Because I definitely agree that your commit history, representing all the thought process you went through while developing whatever you're coding, does not have its place in the history, and I, as a colleague and reviewer of your code, couldn't care less about it.

I make a distinction between "WIP commits" and "atomic" commits that have well defined boundaries.

While developers work on their feature branches, they should be free of committing however they want, making "WIPs" commits left and right, adding code in one commit, removing that same code in a subsequent one, refactor, type terrible commit messages with no meaning, or not committing at all even.

However, once a feature is ready to be reviewed (and then released), I'm arguing that developers, before submitting their work, should re-organize their commit history: remove WIPs commits, remove refactors that happened during development, squash distinct commits that are part of the same logic, amend commit messages, etc.

The final commit history shouldn't be a series of commits that represent everything that went through the mind of the dev who coded the feature: that's not what I'm arguing for.

The final commit history should simply be multiple small digestible and reviewable chunks of work, that make sense on their own and in the context of the pull request they are part of. Those commits should represent the code in its final, best, most well thought out form.

And once you've gone through the trouble of organizing your work into meaningful, atomic, digestible, chunks of work, with appropriate descriptive commit messages, there's no reason not to merge that work.

As it was useful to the reviewer of your code during the pull-request review, it will also be useful many months later, when you need to review that same old code that seems to be having a bug or to understand why it was written the way it is.

Well, at least, IMO.


> However, once a feature is ready to be reviewed (and then released), I'm arguing that developers, before submitting their work, should re-organize their commit history: remove WIPs commits, remove refactors that happened during development, squash distinct commits that are part of the same logic, amend commit messages, etc.

I agree that this is the optimal solution but isn't this a hefty piece of work that even might introduce subtle bugs?

Eg. I'm thinking of the overall feature needs work to be done in two separate "areas" of the code base and your wip-commits are touching these two areas "separately". Then when you are finished, reordering and squashing the wip-commits into two atomic commits seem like an obvious thing to do. But in a minority of commits you change that tiny little shared part back and forth, from both contexts. Finally you figure out how the shared part should work in the final wip. When you try to reorder the commits the changes in the shared part changes order with leads to merge conflicts when rebasing/reordering/squashing that needs to be solved.

So I'm on the fence of doing that extra work which in practicality could involve an unknown amount of "made up work" to get the clean atomic commits. And those commits then risk being a lie since the reordering, like the shared part in my contrived example, could have made that the first "atomic commit" that never existed also has a bug that never existed that someone down the line potentially stumbles upon when bisecting.


> Finally you figure out how the shared part should work in the final wip. When you try to reorder the commits the changes in the shared part changes order with leads to merge conflicts when rebasing/reordering/squashing that needs to be solved.

An alternative approach is to create a new branch off of master (or whatever the base branch is) and take the diff of the HEAD of your branch with your work in progress commits and stage the hunks of the diff to make a commit that contains a logical change (and repeat the process for the remaining parts of the diff). The git apply command us very useful in this regard.


> [...] unit tests that demonstrate that it works

I'm going on a bit of a tangent here, but unit tests don't demonstrate that it works – acceptance tests demonstrate that it works. Unit tests help reduce the time to narrow down problems and in some cases help with refactoring (if refactoring is limited to the inside of methods).


> Master should always be in a deployable state.

+1 to this and squashing commits. I'd even go far as saying trunk based deployment model over trunk based development.

This can be done via Pull requests or short lived branches. I prefer having all the context, discussions and testing added to the pull request. Even setting up environment per pull request. It's much easier to track and get feedback.


> Master should always be in a deployable state.

I think the head of master should always be deploy-able, but ultimately it seems better to make use of tagging of specific hashes which realize some degree of reliability for field use.

Obviously you have your unit tests and other kinds of tests, but ultimately nothing beats just using it. If it "feels" reliable (which can be hard to quantify sometimes) or has proven itself (you've been using it without issue for some time), then it's probably worth tagging for that fact.

> Incomplete features that will take more than a week or so

> to develop should be merged to master early protected by

> feature flags. This avoids long running branches.

I prefer to properly scope the time/effort required to complete a feature, rather than having massive features. If a feature is taking more than a few weeks to bring into fruition (assuming ~1 dev per branch), then it's probably too large of a feature and needs to be broken down. A properly sized feature should probably fit within a sprint cycle.


I prefer to properly scope the time/effort required to complete a feature, rather than having massive features. If a feature is taking more than a few weeks to bring into fruition (assuming ~1 dev per branch), then it's probably too large of a feature and needs to be broken down. A properly sized feature should probably fit within a sprint cycle.

Ideally, I agree, though this does depend on what your business requirements are. Some features are just fundamentally complicated, and it might not be possible to implement the feature itself, or even decompose it into meaningful parts that can be implemented, in one short round of work. Fortunately, that doesn't happen very often for most types of development work!


> Some features are just fundamentally complicated, [..]

> Fortunately, that doesn't happen very often for most types

> of development work!

Every problem should in theory have some of breaking it down, otherwise we have no hope in programming it. We should be able to break it down far enough that it can be reasonably sanity checked. If the feature is too complicated then you're asking for bugs.

From an objective/goal stand point, we also need some way to measure our progress and be able to evaluate whether a pivot is required. It's very easy to get stuck in a problem for a long time if we have no external measure.


Every problem should in theory have some of breaking it down

I don't see why we should expect that to be the case. For example, I once had to implement an algorithm to visualise a certain set of data using a certain type of diagram, where there were a lot of technical factors that determined exactly how that diagram should be drawn. IIRC, it took a few weeks to implement that algorithm, but there weren't any natural breaks where you could, say, draw only half of the data set or half of the diagram features and get a meaningful result. It was inherently a relatively complicated, all-or-nothing feature.


It's hard to argue a feature I haven't seen, but just the fact you are able to program the feature means you were somehow able to break it up into a series of steps. Okay, breaking it down wouldn't produce something workable in itself, but it should produce something measurable, testable even.

For example:

Step 1 : Extract relevant data

Step 2 : Massage data into the correct format for graphing

Step 3 : Graphing tool visualization for simple example data set

Step 4 : Write tests for graphing/visualization

Step 5 : Plug real data into graphing tool


Yes, it was implemented in a series of steps, but the cumulative result of those steps had zero business value and was not something that would ever have been released to customers until the final step was done.


Yes yes yes. Different teams have different needs, of course, but what you described should be considered the default workflow.


I personally don't squash commits and loose information. I see the value in having a compressed view of the log history, but in that case I use

  git log --simplify-by-decoration


I feel really good when I completely destroy my application and then roll back to the previous working version. It's such a thrill, man.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: