Hacker News new | past | comments | ask | show | jobs | submit login

I'm a big, huge fan of recording what should have happened instead of recording every typo and forgotten semicolon in your history. There's a difference between draft commits and published commits. When I'm reading published commits, i.e. history, I just want to know your intent, not your typos. So what are the tools that Fossil offers to make sure I don't have to see your typos in the history?



I'm also a big fan of not deleting data. I don't like squashing commits, for example. But I also want to be able to see high-level intent.

If instead of "squashing", it were "grouping", I'd be happy. I could encapsulate a bunch of messy commits that I made while I didn't know what I was trying to do. The intent would be clear at a higher level, but if you want to dig in to see what it actually took me to achieve that, you can see all my experimentation by looking at the commits inside the group.

Groups should be able to be nested, of course.

The only way I know how to achieve this in git is my relying on no-fast-forward merges. There's a post detailing this approach [1], but unfortunately a lot of git tools don't support this workflow that well. I haven't gotten it to work well in gitlab, for example.

[1] Git First-Parent-- Have your messy history and eat it too : http://www.davidchudzicki.com/posts/first-parent/


You misunderstand me. Almost everyone does. I do not think you should squash all of your changes into a giant hairball commit, and I don't think first parent (which is effectively the same thing) solves this problem either.

I think each of your commits should be individually rewritten until each commit makes sense and tells a single, indivudal story that makes sense on its own, while at the same time be completely atomic and as small as possible.

You created a new function? That's one commit. Take a moment to explain why this function is going to be useful in future commits.

You called that new function from several new spots? That's another commit. Explain why each of these calling sites requires this function.

You decided that there had to be some style and whitespace changes? That's another commit. This one can go without explanation.

You found an old bug along the way that can be fixed with a one-line change? That's another commit. Hey, nice catch. Perhaps a few lines about this bug and why your small change fixes it?

Together, these are all individual scenes of a larger story. But the larger story doesn't need the behind-the-scenes of how you came up with these scenes. I don't need to see all your drafts of each scene. I just want the final scenes that make up the final story.


Ideally it should be this way, but it's impractical in reality.

It requires that you either stop your development workflow to commit as you go along, or that you untangle all the pieces after they're already entangled.

If you commit as you go, it's an expensive mental switch to fire up git and also run all the tests (since surely part of this workflow is to apply the principle that no commit should ever break the build). You also take an extra productivity hit every time you change your mind about something a little later (e.g. you added the function getFoo() but realize it should have been called findFoo()).

If you work for a while and then try to bundle up small, atomic changes, that can also be very difficult. Tools like git group together contiguous chunks of changes when committing, and prying them apart later can be difficult. I often do this with a combination of "add -p" and then "stash save -k" to temporarily get rid of things unrelated to what I'm committing, but it's a chore. During a selective "add -p" session you have to mentally keep track of what belongs together, thus what dependencies are between every chunk you're adding.

Committing as you go is easier, but it's slow, and doesn't work well when you're working across many files with a big change that introduces new semantics in a lot of places. Both techniques require that you keep track mentally of which parts are related, of course.


Untangling is what I mostly do. I consider the untangling my own internal code review. I need to read my own diff and figure out what goes where and what each part does and why it's necessary. My commit messages are then my own code review comments.

I figure if I don't carefully read my own diff, why would anyone else? And once it's untangled, I am hoping others will find it easier to read too.

Git doesn't provide as many tools as I would like to make this process easier. It's partly why I don't use git. Mercurial's absorb command helps a lot: it absorbs changes from your working directory into the appropriate draft commit that corresponds to the same context:

https://gregoryszorc.com/blog/2018/11/05/absorbing-commit-ch...

Wait, it appears someone finally ported it to git:

https://github.com/tummychow/git-absorb


That's a cool script. I will definitely try that. Augmenting commits by doing partial commits then fixing with "git rebase -i" and squashing with "fixup" takes so much time and mental effort just to not make a mistake.

It still doesn't solve how to disentangle changes that have become interdependent. For that you have to concentrate on committing atomically and planning ahead a lot.


I don't think this is impractical.

I've been using this approach successfully for 8 years now on tens of open source projects and various company code bases of all sizes.

It does take a small amount of overhead (I measure this, and for me it's around 5%). But that pays off immediately as soon as you or someone else reads it a few weeks later when investigating an issue.


I use gerrit for everything and this workflow is exactly what it gets you (well, to be clear, my workflow is commit-per-issue resolved, not commit-per-function added, though you could use it that way also). I highly recommend it or a similar tool.

I wrote a post three years ago about my switch:

https://techspot.zzzeek.org/2016/04/21/gerrit-is-awesome/


To toss in a counter-point: I do commits as I go and occasionally go back and make changes so it's a coherent sequence. For the most part, once you are fluent with Git[1], I've found it to be a productivity improvement, and code reviews have been both faster and more useful.

If you're doing two semantically different things, put it in two different commits. If it's one, put it in one (merge commits work too). That's just good change-hygiene, for the same reasons you try to isolate behavior in code, rather than mashing it all together into a single func just because you happened to be doing it all around the same time.

tl;dr we don't name our funcs "june_27_through_29", don't name your commits like that.

[1]: a huge investment, so I totally get why this isn't an early-coder practice, and it's rather painful. but IMO worthwhile, usually I see people spending far more time fighting it than it would take to learn it.


While I am following this conversation closely, I wanted to politely make a suggestion about something you said:

"You misunderstand me. Almost everyone does."

That sounds really frustrating. It's not clear from your post whether you mean this as "everyone who reads this comment does not understand me" or "people frequently misunderstand me"... but typically, someone who feels this way experiences this in the latter, general sense.

Good news: it is possible to dramatically improve the ratio of people who understand you. It will, however, require that you change how you communicate some kinds of information.

What I'm saying is that if people don't understand you, the first place you need to look is your own tools for communicating with people. Of course, what you are saying makes sense to you. But there's an intellectual fallacy at work if you assume that "almost everyone" is the problem and you are the solution. :)

Have you ever heard the saying, "The only thing all of your crazy ex-partners have in common is you?"

There are people who are tasked with explaining far more complicated concepts than source control who people claim to love learning from. That means there is hope for us all.


Yes, I know I'm not communicating effectively. I shouldn't have implied that everyone is wrong but me. I know that there are people who understand me, but in a world of github and pull requests, it's really hard for me to explain what the world looks like otherwise.


There is a name for what you are describing, it's called "atomic commits". Every commit can stand alone and at every commit the software should be in a working condition.



I mostly agree with you, but I think this might be going a little too far:

> You created a new function? That's one commit. Take a moment to explain why this function is going to be useful in future commits.

> You called that new function from several new spots? That's another commit. Explain why each of these calling sites requires this function.

In my opinion each commit should make sense on its own. It doesn't really make sense to create an unused function, so these two changes should really be one commit.


Maybe. I think both ways can make sense, and if you insist on having the function and the calling sites in the same commit, that can make sense.

I think my proposal can also make sense because (1) it splits up the commits into two units that still keeps the codebase in a stable state [this is my rough metric for what "atomic" means] and (2) defining a function requires some independent contemplation about why that function is defined that way. Inserting that function into calling sites can be a logically distinct operation if the calling sites are varied and distinct enough to each have a different reason to now require this function.

Thus the two commits can be semantically distinct.


I see this the same way. Like the notion that purely cosmedic changes should be committed distinctly in order to not clutter up commits that change behavior (so they are simpler to read), it makes sense to me that one would commit entirely new units of code separately before re-wiring the logic of the existing code accordingly. That way the first commit basically states "I've built this new thing", and I don't have to consider 300 random existing codepoints in understanding what it does.


that doesn't scale. I personally prefer a single squashed commit linking to a full discussion in a PR/MR: do a git blame, even if you get a giant hairball commit, you should be able to trace it to a review process (PR/MR) where it was discussed and thoroughly reviewed.


It scales quite well. Linux itself is developed in this way. Or perhaps you think Linux isn't at a large enough scale? (No sarcasm, I know that there projects out there much bigger than Linux.)


I'm not sure about "scale" but Linux, as an open-source project where external actors want their code included, can push arbitrary amounts of work onto those actors with no additional cost to itself.

They can say "to get your code upstream you have to do twice as much work" or 3x or 4x or whatever. It's not their cost to bear.

An internal team pays that cost. They have to consider whether the trade off of having a pristine commit history is worth the additional overhead of doing it.

I personally care more about PR size and am happy to squash all commits in a PR. If that is too big I'd rather see multiple smaller PRs.


I think the extra work for a single developer to perform atomic commits is justifiable.

How does this work with multiple developers working on the same repo? I'm assuming everyone should work on their own feature branch and send PRs once their branch is done? Should the commits also be tagged by the feature branch they're on? Should the CI approval workflow be run against any combination of commits on the feature branch or against the final HEAD?


With git, I think it helps to think of development in terms of patch series.

An individual commit is a single patch, intended to do one thing (and hopefully do it well), and a feature branch is a patch series.

A pull request is then a request to review the series. If you need to change things, git allows you to rewrite your commits to send in a revised set of patches.

Before merging, your CI would create a temporary branch off the current master, merge your feature branch to that, and run tests against the result. I don't think testing individual commits (fully, at least) in a series makes much sense if you're going to merge all of them to master anyway.


Not sure why this was down voted. Squash’n’Merge on Pull Request keeps master history clean while tolerating sheninagins on feature branches.

If important things are lost then perhaps the PR was too large.


To do this effectively, we have to change the way we edit code, so it's an editor/workflow feature rather than an scm one. Trying to solve it downstream in scm is not productive.


How often dou you actually look at this historic detail you seek to maintain? Daily, weekly, monthly? Is it more for to satisfy a feeling than an actual need? I mean if some junior dev wallows on some branch for 40 commits, I don’t want to see any of that, I just want to see what was finally merged.


I look at git blame (using 'Annotate' as IntelliJ calls it) quite often to figure out reasons for some certain change/implementation logic. It irks me when the result is just some giant squashed commit with 40 lines. Which of these explains this specific line? _History_ itself, yeah, not that much.


Would it not be significantly more frustrating if you use git blame and you see:

> Revert: Some WIP didn't work out.

Git blame again from prior to that commit.

> Added missing semicolon.

and again:

> Fixed spelling.

and again:

> Stupid typo, wrong method call.

and again:

> WIP, going to see if X can work.

Before running git blame once more, finally getting to the commit message that actually pertains to the current line of code you're seeing. Including the explanation (commit message) for why it is the way it is, and very importantly when this line of code actually made it into the software?


I would prefer not to have these errors in the first place. They should be caught during review, and then fixed in the commits that introduced them with an interactive rebase. If this type of error is consistently getting through your review process, you should probably consider revising that process.

...and maybe consider adopting atomic commits -- the main reason I like them is not actually because they make it easy to look at history, but because they're easy to review and catch these types of errors. If each commit stands on its own, it's obvious when one doesn't.


> Would it not be significantly more frustrating if you use git blame and you see: [...]

Yes, it would be very frustrating. But you're presenting this as if this is the only alternative, it isn't: I wouldn't approve a pull-request that have commits like the one you mention, I would ask to rework the history of the PR to be a logically sequence, just like exposed in this comment: https://news.ycombinator.com/item?id=19007171

"I think each of your commits should be individually rewritten until each commit makes sense and tells a single, indivudal story that makes sense on its own, while at the same time be completely atomic and as small as possible."


fwiw this is largely what `git log -L` (and maybe `--follow`) solves. You can log changes to a file or lines in the file and have it follow through moves.

Granted, most tools don't make that easy to do. But most Git tools are rather blind mimics of simple CLI commands with a nice UI (which can be a huge help), rather than being value-adds in terms of behavior or understanding.


Can you explain this a bit more?


There is a difference between squashing together thirty commits of someone working on one concrete thing (I don't need to see all the mistakes and reworks you made, I just want to see the result in a nice, easy to read diff), and thirty commits of someone working on thirty things.

The latter is, of course, wrong as it makes the repo history harder to read, while on the other hand the former improves readability.


It would be nice if in the first case one could annotate those thirty commits in a narrative "here I started another attempt", "this solution could not work for X,Y and Z reasons", "these 8 commits are just typos" maybe with also the ability to only select a subset of a commit.


I've recently added the ability to associate a wiki page in Fossil with an individual check-in or with a branch - as additional documentation about that branch or check-in. This is similar to your concept, if I understand you correct. The Fossil changes have worked well enough so far. But only time will tell if this ends up being a good idea or not, I suppose.

An example is the "About branch begin-concurrent-pnu-wal2" at the top of the page https://www.sqlite.org/src/timeline?r=begin-concurrent-pnu-w... - the page shows all check-ins for the branch, and the Wiki at the top gives a summary of what that branch is about.

Another example is the detailed discussion in the "About" section for check-in https://www.sqlite.org/src/info/718ead555b09892f - important information that records the thinking about this commit but which seems too prolix for a check-in comment.

Let me know your thoughts on this idea.


If how understand it right it is pretty much what I had in mind (assuming that you can always create new branches on old commits).

In the last week I have been quite charmed by many of Fossil ideas, I will for sure try this feature too.


Do you require this functionality and for people to write meaningful comments? I just question if version control is where any of this should happen.

Personally, how I tend to work is if there’s some link between commit and ticketing system I can refer to, it’s about the best you can expect.


I like to write tests (in a separate repo) which iterate over each commit and mark the point in history where they start passing--which is usually when the feature was implemented--and (more importantly) the point in history when they start failing again.

These points usually indicate communication/comprehension errors involving two developers. When I bring the problem to the developers' attention, their reactions differ based on their commit style. If they have atomic commits it's usually a five minute conversation because the nature of the problem is immediately apparent. If they have large squashed commits, I usually have to bring both developers together and have them fight over whose problem it is.

So I would say... a couple times a week, but the overall time savings of finer granularity is significant because it limits the number of parties that end up huddled around a single screen.


I'd rather have it and not need it than need it and not have it.

Besides, squashing means the identity of the commits changes, doesn't it? So you can't merge the same branch into 2 different branches (like merging a bugfix into both the release branch and the trunk) while keeping the identity of the commits - then when you merge your release branch into your device branch you get wonky duplication of commits in the history.

I notice that move detection seems to get messed up by squashing sometimes...

But maybe I'm using Git wrong, which imho is the biggest flaw of Git - it's so flexible that there are so many ways to "use it wrong".


> I'd rather have it and not need it than need it and not have it.

Yeah but again, have you ever needed it? Because you can say the exact same thing about preserving every sequence of backspaces, deletes, and key typings into an editor. But when was the last time you or anyone needed that level of granularity?


> I'd rather have it and not need it than need it and not have it.

But squash commits do let you locate the identity of the authors. You just have to look at the PR, where all the original commits are listed


Often? When you're trying to see when a specific change occurred you often have to go down to the specific commit, a high level group isn't good enough (particularly when the group could be shared by multiple people).


Then don't squash atomic commits representing single, logical changes. The point is that if you have 3 commits in a row correcting typos, squash those 3 into a single atomic, logical, commit.


>If instead of "squashing", it were "grouping", I'd be happy. I could encapsulate a bunch of messy commits that I made while I didn't know what I was trying to do. The intent would be clear at a higher level, but if you want to dig in to see what it actually took me to achieve that, you can see all my experimentation by looking at the commits inside the group.

That oddly sounds like a feature available in mercurial.


I don't care about how many typos you made while developing a thing. I only care about your upstream's history not getting re-written, but I don't care to see your dev history -- it's a distraction at best.


this reply is way too late, but yeah, I get that you don't care. I don't care either.

But if the developer stored that information in a commit, and there is a cheap (computationally and cognitively) way to keep that information, then I don't want to delete it.

It's not charitable for you to call it a distraction at best. There might be useful information available in the dev history. You shouldn't be forced to see it (that would be distracting), but you shouldn't be precluded from seeing it either.


I sometimes go back and “group” commits with a ‘git rebase -i’ and squash commits that are related.

Is that similar to what you’re talking about?


No, I believe what author is asking for is to keep those changes in git, but have git manage grouping of history logs for you.

E.G. you have worked in a feature branch, and committed 10 times - let those commits be kept in the log, but when running git log there must be a flag that allows for filtering based on how granular the output must be.

That can be done with rebase, but then you loose history.

I am pretty sure you can implement something like that in git based purely on commit message content.


There is no theoretical reason you can’t maintain two histories—-e.g. when rebasing have a “rebase-merge” commit that has the hash of the other tree, and optionally keep that history around in the git repo. Then you could do a ‘git blame —orig’ or whatever to switch between immutable and cleaned up history.

No VCS I’m aware of supports this. But they COULD.


Your suggestion is more flexible, but a simple way to group commits is to preserve the original branch structure like Mercurial.


I tend to commit, read my commit, and then find all kinds of mistakes and have to amend or redo the changes on a separate branch. If those options aren't available I just have a messed up commit history. This is all because git makes modifying your commit history very difficult to do. I think this immutable feature makes git worse because I have no intention of lying to myself or my team about my commit history. I would just like easier tools for pruning and organizing it. Instead my commit history is always a mess that better represents my knowledge of git than the progress of whatever I'm working on.


Are you pushing the commits with errors? Are you merging the commits with errors? If both of those are true then this sounds like a process issue.

Git makes it extremely easy to edit history, with the ability to amend any commit; even several commits back with simple CLI tools like `git rebase -i <ref>`.

However, what it doesn't like you doing is ripping the rug out from other people i.e. editing history team members are basing their work on.

The entire purposes of distributed version control is that development should always happen on a branch. Whether that's a local branch, or some temporary pushed branch (pull/merge request branch etc.) In both cases you can safely rewrite history.

However, `master` (or whatever mainline branches you have) should never contain simple mistakes (e.g. non-compiling code), because the code should have been reviewed before being merged. Of course bugs (non-simple mistakes) will happen, and these ought to be fixed in future commits. Bugs that make it into pre-release or release builds (i.e. `master`) shouldn't be edited out of history or forgotten.


Hmm, okay, I should learn how to rebase then.

The biggest part of my problem is a total lack of commit discipline but there are times when I'm working on a branch where my commits don't tell a clear story (changed something then changed it back because I decided to do it a different way). That's when I most wish for better ways to tell that story.

I feel like an idiot for not knowing rebase could solve some of this for me. ...will definitely try it next time.


That back and forth is the most important part of the story! It shows that you thought through multiple approaches to the problem, (hopefully) why they didn't pan out, and they give someone else a starting point for returning to that approach in the future.

It isn't exactly rare that I go through the blame history on some project to find out why something was done in a way that seems stupid at first glance, just to get stuck on a giant squash commit saying "Implemented X".


"back and forth" is not the same thing as "all kinds of mistakes".

No-one cares about stray keystrokes other developers make, it's just noise.

Yes, we absolutely care about the design of the software we're working on, and that's what commit messages, self-documenting code, comments, issue trackers and project management (planning session etc.) are all for.

When you squash commits in Git the default generated commit message is even to merge together all your previous commit messages. Now is your chance to look at those old messages and change "Did X" to "Attempted X, but didn't work because Y".

When I'm investigating when and why some code was implemented the way it is; I don't want to look at a Git blame trying to find when something was changed, just to see that the most recent change was reverting some earlier messing around. Just to git blame again starting from just prior to said messing around, just to see the same thing again - noise is bad!


> No-one cares about stray keystrokes other developers make, it's just noise.

Sure, and `git commit --amend` is fine for those cases.

> When I'm investigating when and why some code was implemented the way it is; I don't want to look at a Git blame trying to find when something was changed, just to see that the most recent change was reverting some earlier messing around. Just to git blame again starting from just prior to said messing around, just to see the same thing again - noise is bad!

I guess that depends on your setup. My Emacs is set up so that `b` is "reblame from before this change". GitHub's blame UI has a similar button (though that, sadly, doesn't preserve in-file context).

At that point the cost of the "noise" is more or less zero.


What if my repository is linked to a CI process that deploys the live code? And tb change needs to be done “now”?

If you are working for a larger company where processes are clearly defined, then it’s good for you and that feature is not needed. But you are loosing all of the agile feature of git in the first place.

In my situation squashing history takes away my other ability to use git as a wiki of “things that didn’t work out”. It’s important to keep that.


> git as a wiki of “things that didn’t work out”

Wouldn't a wiki be the best solution for that? Git is a software development tool, not a design or project management tool.


In git you can do `git commit --amend --no-edit` to update your last commit, in case anyone is wondering.


I don't think it updates your last commit, it just deletes it and creates a new one with the same changes (along with any new changes being amended). It's an important distinction because if you push to a remote branch that has the old commit, you need to overwrite it with the new one. Commits themselves are immutable.


I don’t squash commits very often, but I do re-order them. When i have a pile of changes to commit that aren’t always perfectly related, I will sometimes check in the cart before the horse. Rebase lets me correct that order.

So perhaps fossil really “shows what actually happened” but perhaps the more accurate statement is it shows the actual commit order. And that may not reflect the actual order of what was done, leading to an inversion of the logical progression of the code.

That said, I’m going to try it anyway for the ticket system and wiki. I use a two tier vcs system. “Official” work goes into svn and I have no control over that. Interim work for me is in git. But I’d like tickets and wiki docs to help me track issues that I can’t get to right away and sometimes forget about after a month or two.


I cant even remember the last time I was looking at a commit in git or any other source control system. I usually look at pull requests and what has changed.

Example files changed view in Github:

https://github.com/TechEmpower/FrameworkBenchmarks/pull/4311...


I want clean, linear history in my upstreams. Always. You can leave pointers to your work branch(es), and even use a second parent commit in commits for this, but no actual merging, always rebasing, and always preserving clean, well-broken-up commit history.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: