OR you could just write Replace invalid ASCII char. Fixes rake error 'invalid by...

leejo · on Oct 18, 2019

> I don't want your entire life story in my commit log.

I[1] want enough debug information in the commit log to be able to reproduce the issue without having to go on web hunts to understand the problem. Especially when the change appears to be trivial on the surface, because these are the ones that can turn out to be rabbit holes.

I don't want to have to interrupt you to get this information because you didn't write a good enough commit message, and you probably don't remember anyway. I don't want go look at an external issue tracker that i may not have access to, or may not even exist anymore.

[1] Where "I" is: me, your future self, a future maintainer, a junior dev, an open source contributor.

0zymandiass · on Oct 18, 2019

To me, at least, the issue with that commit message is the signal-to-noise ratio. There is a lot of exposition for each piece of information. I prefer a more declarative commit message. However, from the writing, I suspect this is just due to the committer not being a native English speaker.

e.g. the first paragraph doesn't lose any important information trimming it down to:

"After adding a test matching the contents of router_routes.conf, `bundle exec rake` fails with:

    ArgumentError:
        invalid byte sequence in US-ASCII

"

Realistically, it would have been a better commit message if they'd given the shortlog SHA where the test was added that exposed the bug rather than an explanation of what the test does.

"After adding test <testname> (08c3e17), `bundle exec rake` fails with:"

SamBam · on Oct 18, 2019

Hmmm, I've always believed that no commit should break a build, even if you're committing the fix right after. Otherwise you're going to cause problems for `git bisect` or other practices of going through the history to find where a problem may have started.

Do other people commit breaking tests and then fixes?

Bartweiss · on Oct 18, 2019

I think this depends quite a bit on what other contributors are doing - it's one of those cases where several approaches are acceptable, but inconsistency is not.

"Commits should always build" is one doctrine I've seen. As you say, it makes bisecting and other error-analysis approaches easy. On the other hand, it risks either having large, opaque commits, or adding overhead to make intermediate commits build - possibly with flawed/meaningless behavior when they do.

Another is "the trunk should always build". In that case, you'd just squash branch commits down to logical groupings that are easy to analyze, whether or not things build. You can bisect on the trunk, but lose all guarantees about state on branches.

Finally, I've seen variations on "no commits that break the product", "no commits that make things worse", or "no committing failing tests without subsequent fixes". In this case, you can't generally commit broken builds, but can specifically add failing tests. The first rule just means "adding failing tests is ok", the second means "converting runtime bugs to failing tests is ok", and the third means "write your test and fix, but split (and ideally tag) the commits". All of these break bisect, but they guarantee the project itself won't become more broken from commit to commit, and they can help with other forms of reasoning about where bugs first occurred.

Every approach there seems viable if you stick to it. If there's no established practice, I suppose the best choice would be based on what sort of work and debugging is most likely to apply.

monsieurbanana · on Oct 18, 2019

I might be missing something basic here. Isn't the "no commit should break a build" impossible to enforce on a codebase where you need to push a commit to run the tests?

Something where you can't test locally, like when testing on multiple architectures or when the tests just take too long for a laptop.

SamBam · on Oct 18, 2019

In my workflow, that would be in a feature branch, and exploratory branches can certainly break, but before I made a PR I would rebase my changes such that none of the commits broke the build.

It's also a different situation. The original one is "I've made a test that shows a problem." Your example is a surprise "I don't know whether this will pass my cloud-based tests." I would edit my branch if I had a surprise failure, since my initial code clearly wasn't correct.

claytonjy · on Oct 18, 2019

I do this, but not in a way then end up on master. It's a driving force behind my preference for squash-and-rebase merge patterns.

A good bugfix PR is often two commits then: one with a test to catch the breakage, another to fix it so the tests pass. Reviewers can see the failing-then-passing CI job logs, so if they agree your test catches the bug, they have additional CI-automated validation your fix worked.

Then as long as you squash when completing the merge, you get the best of both worlds.

robin_reala · on Oct 18, 2019

Having worked with the committer, I can tell you that he’s definitely a native English speaker.

gus_massa · on Oct 18, 2019

I prefer that the commit includes the addition of a test in the test suit that get fixed (or a few). This is good because:

* It ensure that the bug is real. [1]

* It ensures that the bug is fixed. [1]

* It prevents reversions (assuming the test are run automatically).

* The test may prevent reversions in other related code, or discover other hidden bugs.

* It brings you closer to a 100% test coverage.

* You don't have to guess how to reproduce the bug, reading the comment.

* If the bug depends on subtle configurations, they should be set in the test. [2]

From time to time there are bugs that are obvious in the code, but they are too difficult to find a test for them.

[1] Been there, done that.

[2] Once I found a bug that depended on the local timezone.

toastal · on Oct 18, 2019

100% test coverage is such an overrated stat.

AstralStorm · on Oct 18, 2019

Write a regression test (including its documentation) instead of just documenting the issue in human interpreted language, immutably.

Your future maintainer will thank you for not having to dig through repository history.

leejo · on Oct 18, 2019

It goes without saying that commits should include tests that cover the change, where possible.

> immutably

That's what makes this modus operandi so powerful IMO - comments in code may go unmaintained, tests may start failing for other reasons, issue trackers come and go, developers leave the company, documentation rots.

The commit message is (unless you have a bad actor) immutably linked to the original change, and that's exactly why you should be thorough in expressing its reason for being. I can git checkout the point in time (perhaps having bisected) and have the information to allow me to reproduce the issue.

cmroanirgo · on Oct 18, 2019

≥ Especially when the change appears to be trivial on the surface

Comments about the code should be in the code, where the next dev will see it. The more trivial a change, with far- reaching implications, the more important this is.

Doing so has heaps of benefits: future devs understand ramifications, shows that this code has been scrutinized, makes it easier when doing refactoring /yanking, or porting code.

That said, leaving the life story out will always be a good idea.

schnable · on Oct 18, 2019

Agreed, but in this case, it was an encoding/whitespace change so there isn't really anywhere else to put this info.

bluedino · on Oct 18, 2019

IMO the repo is the code

However, I would have done a simpler commit and linked to an issue where I explained the problem/solution in more detail

mark-r · on Oct 18, 2019

This assumes your issue tracker doesn't change. I've been at my current position 8 years, and in that time we've had 3, and the first 2 are shut down.

bluedino · on Oct 18, 2019

We use Gitlab so it has both in the same project - but like you said that could change

afiori · on Oct 18, 2019

> I don't want go look at an external issue tracker

Related question: are there projects that use git itself as issue tracker?

cipherboy · on Oct 18, 2019

Pagure [0], Fedora's git forge, hosts code, issues, docs, and pull requests as four separate git repositories under the hood [1]. However, only project administrators can clone most of those repos.

[0]: https://pagure.io/pagure

[1]: https://docs.pagure.org/pagure/usage.html

johnday · on Oct 18, 2019

I can imagine that working to a degree: Make a fork of a commit at an issue, then merge that fork back in with master at point of fix. Bit of a mess in the tree though.

_TwoFinger · on Oct 18, 2019

https://news.ycombinator.com/item?id=13732598

BlueTemplar · on Oct 18, 2019

https://github.com/MichaelMure/git-bug

MrGilbert · on Oct 18, 2019

This. It's the same with comments in code:

I don't want to read what the code does (I can read that myself, thanks!), I want to know WHY it does it the way it does it - especially, if there is a more obvious, better way.

Also: People leave companies. Or die. At some point in time, you won't be able to ask the original author.

throwaway40324 · on Oct 19, 2019

No thanks. If I had a dime for every function that's so obviously self-documenting to what it does.. etc. If you tell others what it does and why, concisely and thoughtfully, no one has to try and mentally parse the what of some clever undescriptive block of code.

Groxx · on Oct 19, 2019

"You spent an enormous amount of time learning X, which is encoded in this three-letter bugfix. Don't make the next person go through that too."

m_sahaf · on Oct 18, 2019

On the other end of the spectrum you get ImageMagick useless commit messages[0].

That extreme aside, I'd rather have commit messages that delve into the why-and-how the commit alters the behavior to the better rather than cryptic message as 'Replace invalid ASCII char'. Now we have documented reasoning and thought process that can aid future debugging. They can also be beneficial for new devs hacking on the project, or students learning how to implement and improve systems.

Personally, I enjoy reading these. The Go commits often have commit messages like these, and they are shared on HN often for a reason. They're learning material. They can't go on a wiki because they're tied to particular set of changes in a particular point in history. They also can't be comments on the code because they're tied to particular lines in different files, and code comments can only cover a set of consecutive lines in one file.

One recent example I could find is this[1]. Yeah, it fixes ^Z, but why didn't the old approach work? Why did it work for some time then didn't? How did it change? Why is this commit optimal, if it is? All of this along with scenarios to reproduce the issue.

Give me your life story anytime over cryptic message.

[0] https://github.com/ImageMagick/ImageMagick/commits/master

[1] https://github.com/golang/go/commit/610d522189ed3fcf0d298609...

graton · on Oct 18, 2019

Agreed. When at some point the website that they are pointing to changes in the future they will lose all context on why a change was made.

I believe in the "plane flying across the ocean without WiFi test" or basically anywhere without Internet access. If I am on a plane flying across the ocean without WiFi, do I have the information in the git commit to understand what happened. A git message that consists entirely of a link to a website is useless in that case.

marvin · on Oct 18, 2019

You can write the brief summary in the first 80 characters, like OP did. Then write details in the body below, in case someone needs the context. Most tools display only the first 80 characters unless you expand the body.

This case is probably longer than necessary, but I've saved a day of debugging on multiple occasions due to someone (also myself) leaving some lines of context, reasons and reasoning after the high-level description.

jordigh · on Oct 18, 2019

> I don't want your entire life story in my commit log.

Why not? Where else do you want it? Is something forcing you to read the full commit log?

There's no length limit on commit messages and commit messages are mostly out of the way. Most VCSes have a way to only show you the first line. So if you want summaries, that's what the first line is for. If you want the full story, that's what the body is for.

Combined with annotate/blame, commit messages can be very helpful source-level documentation. Nobody has ever complained about too much documentation, and commit messages are the perfect time to document what happened because it's one of the few times where our tools actually force us to write something in order to proceed. As long as we're being forced to write something, write something good and informative.

jmilloy · on Oct 18, 2019

I think the problem isn't the length or content of the commit message, but its organization. It needs to have the most important information first. It reads as an "entire life story" because it is written in a narrative, sequential form. Better organization would make it skimmable, and later coders could only read as far as they need to.

YourMatt · on Oct 18, 2019

If I'm searching commits, I'm trying to find record of what changed and when. I only want clues, and quick skimming is paramount. I want no personality. I want concise descriptive commit messages.

That said, we reference an ID from our project management software with every commit, so once I find the commit I'm looking for, I can reference it back to external documentation. I still discourage personality there as well because it can get out of hand and clutter the comments, but it's more forgivable than being on the commit itself.

mattacular · on Oct 18, 2019

The pull request is a good place to put such a large amount of information. That would also be a good way to make sure it is seen by the broader team instead of burying it in commit history. You could make the argument that then it would not be part of the git history and therefore could be lost if you change hosts.

jordigh · on Oct 18, 2019

I will make that argument. The hosting is ephemeral, the commit message is eternal.

Plus, what if you want to know what happened and you're simply offline? Let's not unnecessarily break the D in DVCS.

CivBase · on Oct 18, 2019

I never understood this philosophy. What makes Git more eternal than any other technology? Why is putting all of your data in one monolithic tool a good solution?

You might change your issue tracking solution. You might change your host solution. You might change your review platform. You might also change your VCS solution. Nothing is eternal.

CydeWeys · on Oct 18, 2019

The commit message itself is way more eternal than GitHub ephemera. There's plenty of old codebases in git that were imported from SVN (or even older RCSes) with all commits intact. What's likely not intact is data in ancient issue trackers from decades past. git is a DVCS, so anyone can clone the repo and get all the commit information. Cloning the issues and such is not nearly so trivial, and isn't a part of the git protocol itself so there's no guarantee it's in any kind of interchangeable format.

Important information should not just be in PR comments. It should be added into the commit information itself so that it'll be maximally available going forward. A good, fully explanatory commit message is a huge asset, and those commit messages will exist for the entire lifetime of the codebase. Anything else, not so much.

jordigh · on Oct 18, 2019

I didn't say anything about git. I said the commit messages are eternal, and none of those changes will change the commit messages (except, perhaps, changing the VCS, but usually that will preserve commit messages too).

CydeWeys · on Oct 18, 2019

You'd be foolish to do a VCS migration that discards the commit messages. I've never seen it happen personally, as people tend not to be that foolish. I've worked with legacy codebases that went from CVS -> SVN -> git and all of the commit messages going back to the very beginning are intact, because why would you ever do a migration that doesn't maintain them?

chadlavi · on Oct 18, 2019

This is a really convincing argument. I was with the parent commenter until I read this; I was like, this is totally PR stuff! But hadn't considered offline situations, or host switches. Thanks op!

gwd · on Oct 18, 2019

> I don't want your entire life story in my commit log.

I agree with this, but I think yours is too short.

Scientific papers typically introduce enough information such that a person familiar with the field but not an expert in that particular area can understand generally what's going on.

That's my ideal for a commit message as well: someone generally familiar with the codebase but who hasn't looked at this specific code (or perhaps not in a few months) should be able to understand what's going on; then the job of the reviewer is basically just verification.

My "template" is normally something like: 1) What's the current situation 2) Why that's a problem 3) How this patch fixes it. So in this case, it might look something like this:

---

Convert template to US-ASCII to fix error

$functions use `.with_content(//)` matchers to do X. These matchers require ASCII content. The $foo template contains a non-ASCII space; this results in the following error:

ArgumentError: invalid byte sequence in US-ASCII

Fix this by replacing the non-ASCII space with an ASCII space.

---

No need for a life story, but still searchable, and has enough information for even a casual contributor to do a useful review.

munk-a · on Oct 18, 2019

I really like that commit message - though it'd be nice to link to any sort of issue/task tracking ID that's relevant to that piece of work.

vanderZwan · on Oct 18, 2019

“I didn't have time to write a short commit message, so I wrote a long one instead.”

V-2 · on Oct 18, 2019

Pascal! My favorite language

Supermancho · on Oct 19, 2019

I, too, would rate this a substandard git comment. Dave basically vomited a bug ticket of information, which is highly contextual and irrelevant ... like the lines he was faced with, which tell us nothing in the future nor anything we could not see in the change. The error is known, from the ticket being addressed. Documenting what error a bundler throws in the application deployment, within git seems...silly, since it will likely not apply to all points in time. That's why we have separate issue tracking.

There was a whitespace encoding issue AND the developer didn't really understand the issue, since they ended with "One hour of my life I won't get back.". Over my 20 years, I've seen this EXACT scenario multiple times across multiple companies. Some jr engineer gets stuck with some troublesome weird error in a corner-case that ends up being a non-standard whitespace. It's a learning opportunity and he lamented it because it was different and nobody told him "we could stop this from happening again, generate a new issue".

There are salient improvements that the git commit would benefit from both comment changes and additional code:

1. Include a (new) feature ticket that is linked to this issue - to create a process that doesn't allow for this again (eg fix a linter)

2. Include the name of the bug ticket (Convert template to US-ASCII to fix error) in the commit title, that was being addressed.

3. Create a test to specifically enforce the us-ascii encoding or add necessary rules to a linter.

Cthulhu_ · on Oct 18, 2019

For critical applications, I for one would like to know the story behind a commit, preferably in the commit itself and not a reference to an external system like idk, Jira.

My favorite examples of commit messages are the Linux kernel, where you can tell that they're being specifically crafted instead of just used as a work log to be ignored. This means that ten years down the line, people can still see when a change was made and why, who was involved, who signed off on it, etc. Have a look at the commits at https://github.com/torvalds/linux/commits/master

konsnos · on Oct 18, 2019

This is true when you can reference the commit to an issue. Then, seeing the simple commit message you can select if you want to dig up what happened by reading up the comments at the issue.

On the other hand it really gets into my nerves when people don't use the task/issue/whatever manager system appropriately. Recently, I lost a couple of days trying to figure out how to compile a c++ framework because the other guy didn't document his pipeline. In general I'm really disappointed by the majority of my colleagues for the lack of comments inside and outside of our codebase and this is a persistent issue, at all the companies I worked for. Me along with other similarly irritated people, always ask for documentation if it is not given.

dmortin · on Oct 18, 2019

> Recently, I lost a couple of days trying to figure out how to compile a c++ framework because the other guy didn't document his pipeline

Some people do it for job safety. The logic is if you don't document things and the knowledge is only in your head then you are more valuable, they can't get rid of you easily. If you document everything meticulously, then you are easier to replace.

throw0101a · on Oct 18, 2019

> Some people do it for job safety. The logic is ...

Has anyone actually seen this logic work out well for the person that invokes it? Generally the type of person that uses it is one that you probably don't want on your team.

thiefmeister · on Oct 18, 2019

I have!

Company promoted the guy and raised his salary because he had plan to leave the company

vidarh · on Oct 18, 2019

I know in instances like that, though, my next step would be to start working on contingency plans, as if someone has proven themselves to be indispensable, then that very fact is a risk that needs to be managed.

beart · on Oct 18, 2019

Like having a new guy learn the material, taught by the old guy who doesn't want anyone else knowing it!

com2kid · on Oct 18, 2019

> Recently, I lost a couple of days trying to figure out how to compile a c++ framework because the other guy didn't document his pipeline.

This is assuming documenting the pipeline would have been helping! You may have spent a few days instead figuring out why your seemingly identical setup couldn't reproduce the build...

Not that I'm bitter about build systems or anything.

mlang23 · on Oct 18, 2019

Talented coworkers dont need documentation very often... If someone cant figure out how to compile something, its likely they are missing knowledge about the language in general...

JeffRosenberg · on Oct 18, 2019

Then use `git log --oneline` and you don't have to see the lengthy details, until the inevitable day when you find you need them.

AlexCoventry · on Oct 18, 2019

How do you surface them when you need them, though? git grep?

JeffRosenberg · on Oct 18, 2019

Git greps work if you're trying to search all the logs. I would think this detailed documentation would be most important when you're trying to understand a specific file or line of code. In that case, it's:

- git blame (who wrote this?)

- git show (look at the commit surfaced by blame)

esotericn · on Oct 18, 2019

Sure? Or 'git log', then use the pager to search, or pipe into something with better fuzzy search, etc?

mschwaig · on Oct 18, 2019

Generally speaking sure, there's no need to make things more complicated than they are, but the author even found some evidence in the history that indicates other people found this message useful.

The powerful thing about this is having everyone put this kind of info in the same place IF they think it might be useful to the next person.

CGamesPlay · on Oct 18, 2019

Still, you've left out the details that you've confirmed that there's no other instances of this in our codebase. I'm also firmly in the "all commit messages should include a test plan" camp, so you should at least say how you found the error ("bundle exec rake was run before and after").

I get you're being terse for demonstrative purposes, but even eschewing verbosity we should still convey all the pertinent information.

nailer · on Oct 18, 2019

This is good, I'd add:

> Replace invalid ASCII char. Fixes rake error 'invalid byte sequence in US-ASCII'. See #123

So people can get the life story if they want it.

acdha · on Oct 18, 2019

One downside to that approach: it requires your issue tracker to be stable for long periods of time. I've worked in a number of places where that's not true and you end up needing to figure out that the #123 linked by the system you're using now was actually #123 in the old system and was migrated as #456 in the current one.

There's a balance here and I especially like that this commit message has enough information to make searches really easy should you need to do something like that.

DJHenk · on Oct 18, 2019

That's why most guidelines for commit messages prescribe a short description and an optional long description. The message in the article does not have a short description, which would have been easy to include. For that reason, it's not "My favorite Git commit message" either.

nickm12 · on Oct 20, 2019

I agree with the sentiment; this message is quite long for an invalid character in a file.

House style in the companies I've worked for is to include a link to a bug report and or code review that provides more context for those who want it. Even without that added context, I'd rather know

lazyant · on Oct 18, 2019

do you think a junior developer, or maybe somebody not vary familiar with Linux would not learn anything or benefit from reading those comments?

Obvious point is that commit messages can be used besides what was done as a form of documentation and teaching tool (why, how).

EliRivers · on Oct 18, 2019

That's great for you. You don't want that. However, if you code in a team, doing everything for your own wants rather than considering the needs of the team (present and future) is just bad software engineering.

celticninja · on Oct 18, 2019

That belies the effort that went into the fix