Hacker News new | past | comments | ask | show | jobs | submit login
Linux kernel coding style (01.org)
82 points by e19293001 on March 28, 2017 | hide | past | favorite | 81 comments



Other people are complaining about the 80 character line limit, but personally, I'm shocked and dismayed by the preference for no braces on `if` statements with single line blocks. It is so easy to accidentally add a line to what looks like an `if` block (but isn't since there are no braces), and then you have a bug that is visually very difficult to spot. I'm pretty strongly against this rule!


I don't get what "what looks like an `if` block" is. In a no-brace if, the statement that follows is indented. The next line after that is not indented. So the indented statement obviously belong to the if.


The issue that comes up is when you go to add a line to that 'block' it's easy not to notice that it's not actually a block, just a single statement. Then you add another indented line, and now you've got something that looks a lot like a block, but isn't at all!

Of course, as others have pointed out, with some sort of reasonable linting setup or with compiler warnings, you'll probably catch the bug (unless macros are involved), but even so, I feel having something that looks like a block but isn't, adds cognitive load for little benefit.


This works well in languages where whitespace matters


I found it interesting that systemd actually requires this behaviour:

https://github.com/systemd/systemd/blob/master/CODING_STYLE#...

I once found a particularly confusing use of this style here:

https://github.com/erlang/otp/blob/master/erts/epmd/src/epmd...


I cannot agree more. I'm trying to enforce the braces as much as possible because one always adds a statement here and there and obviously forgets to put them (for debugging purposes for example). This is a very big drawback, as Linux is positioning itself as a programmer-friendly and "simplistic" C project.


> It is so easy to accidentally add a line to what looks like an `if` block

But it is also so easy to catch that automatically (and in fact, gcc and clang will flag this as an error for you with the right incantation, and refuse to compile), that it is stupid to leave it to 50 different committers all avoiding accidents.


Be careful when using macros however: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80076


You can always use the flag -Wmisleading-indentation. I'm not sure how well it works, though.


If you enforce the use of clang-format then this is not an issue as it will indent the code correctly.


2017 and contributors still need to discuss about coding style =/

If you want to enforce codding style (which is OK) you need to provide developers the tools to ease that process.

Contributing with patches to 15 different projects, using 15 different rules for styling makes it impossible to keep up.

FOSS projects should start using tools like clang-format so developers don't need care if a project use 2/4/6/8 spaces instead of tab. The tool should be able to automatically format the code before committing.

Some projects even use git hooks to compare the commit with auto-generated styling tools to check if it follows the rules.


Sure, it's difficult for a committer to keep up with all of the different styles. Imagine being a developer on a project with 50 different committers, all doing it their own way.

As a committer, I would rather have it clearly explained what the rules are before I do it incorrectly. Sure, having tools that will do it for me is awesome, but considering how many examples in the OP are different because of historical reasons, those tools are difficult to configure.


> Sure, it's difficult for a committer to keep up with all of the different styles. Imagine being a developer on a project with 50 different committers, all doing it their own way.

That's exactly what tools like clang-format fixes. Code will be in the repository using one correct styling. The difference is that the commiter won't have to style it himself.

> As a committer, I would rather have it clearly explained what the rules are before I do it incorrectly. Sure, having tools that will do it for me is awesome, but considering how many examples in the OP are different because of historical reasons, those tools are difficult to configure.

Which tools you said that are difficult to configure? clang-format? It's 5~20 lines in a .clang-format in the root folder of your repo. Configure it once and profit forever.

* I'm talking about clang-format here because that's what I use for C/C++. I don't know anything about other languages.


How well does clang-format handle subfolders with their own rules? I don't know, as I'm not a C/C++ developer; but I know that the tools I use don't handle them very well. I still provide them, of course, because it's better than nothing.


Totally agree.

It's the same for any other mental load that could be offloaded to machine. Coding style? Don't demand people sticking to rules - use some formatter/prettyfier/linter. Well known and frequent bugs? At least try to use static analysis or write tests and run those with some sanitizer. Codebase too big/convoluted/... - use some tools (parsers, indexers, search) to get better insight. Just try use proper tools and machines.


Only thing I really don't agree with is the 80 char max.

I'm not suggesting unlimited but isn't it time we revisit this?

It really feels like one of those "we've always done it this way so just leave it"


In truth, longer lines would probably be fine, except that 99% of the time when your line is more than 80 characters long, there's a better way to do it with shorter lines. I've made a habit of reducing lines to 80 characters or less, and I would guess that for every 50 times I take a line I wrote that's 80-100 characters long and figure out how to split it into shorter lines, 49 of those times, the shorter version is better. That seems like a good reason to prefer the shorter line limit.

Note that it isn't a hard limit. The actual rule is "[s]tatements longer than 80 columns will be broken into sensible chunks, unless exceeding 80 columns significantly increases readability and does not hide information." So longer than 80 is fine, but only if it significantly increases readability. That sounds about right to me!


This! It's not just an aesthetic preference. Like the tab limit, it is instrumental to impede bad logic.


> In truth, longer lines would probably be fine, except that 99% of the time when your line is more than 80 characters long, there's a better way to do it with shorter lines.

myDescriptiveResultVariable = myDescriptiveMethod(myWordyVariable, myVerboseVariable);

That's 87 characters.


The Linux kernel coding style was developed to meet the needs of the Linux kernel project. When the code base is nearly 20 million lines, precedent counts.

Most projects aren't that big and don't have needs that are like the Linux kernel. There may be (and probably are) good reasons not to adopt its idiosyncratic choices.


I would disagree with the reason being "because we've always done it this way."

I started programming C full time about 2.5 years ago (after many more with PHP, et al). There's an aesthetic to writing C that I don't find with other programming languages. I believe this is, in part, due to the generalized use of shorter variables and keeping everything as compact and simple as possible.

Since deciding to keep everything at an 80-char width, I can say that it has helped me maintain a certain kind of readability I don't find with other languages. And, while aesthetics don't matter once the compiler takes over, I can say that good aesthetics do improve my ability to program more effectively and efficiently.


I second that, and want to add that scanning a shorter line is easier on the eyes, too. There's a reason typography conventions limit line width to 60-70 characters.


As someone who routinely has 5 or 6 files open in various configurations of Emacs windows, 80 char max is a huge convenience. Out of interest, what's changed since it became a norm which would make it worth reexamining?


Well, no ones coding on a VT100 terminal anymore :) It started as a physical constraint. Most computers, even laptops, have wide enough screens that 80 characters looks kinda compressed.

Personally, I've started using 100 columns for personal projects. But I stick to 80 for anything someone else is going to have to look at. To some extent, its more important to have a standard, so you don't have to worry about weird line-wrapping placements when someone else looks at your code, than it is to have one that fits superwell on the average modern screen. So we're probably stuck with 80


Screen size. On my two 24 inch monitors, I can easily have four side-by-side files with 100 characters and a large font size.

This being HN, naturally someone's going to tell me they do most of their programming on a 7 inch terminal screen.

But 80 chars comes with a cost: over terse code, or a lot of scrolling.


Exactly, I do the same: dual 24-inch monitors with 4 windows, and a 100-character line limit.


Very true. Perhaps I'll have to switch to 100 characters for personal projects.


I heard once that it was because punch cards were 80 columns. But my guess is screen resolution? We have much higher definition screens and can fit more on a line now? Should we have longer lines of code? Probably not. Could we? Heck yeah.


Yes. Punch cards were 80 columns. Later, Teletypes for the most part were also (I think some could do 132 columns). Then terminals like the VT100 carried this forward (again, some could do more, but 80 was still the default and they added the 24-line convention). The IBM PC kept to this standard also. And in 2017, the "default" terminal window is still 80x24.


While it's true that most of us have more than 80 character width displays most of the time, reducing the strength of that argument ... I dislike having to turn my head to read a long line. Personal preference of me.

I also think that code that has been constrained to 80 columns reads better. Personal opinion of me.


I agree that it reads a lot better and if you have to scroll horizontally it reads a whole lot worse, as a C# dev I have to constantly scroll back and forth to read a particular code block. I'm pretty sure C# code would fail terribly at fitting within 80 chars w/ its long names and generic type declarations and everything being indented withing a namespace it'd practically be useless to try to edit C# code from a shell interface.


It's impossible to place such a limit on Java/C#, but for C, it's not that hard especially if you go for 4 spaces indentation.


with c# you need at least two more indents for class and a completely frivilous namespace block. i wish c# had adopted java's namespace syntax too.


This reminds me of one diamond-in-the-rough of Erlang's syntax: rather than a module block, you just have a module-name attribute at the top of each file (where the entire file is then related to that module.) Your functions are just right there against the left of your screen!


Haskell has a module block, but you don't need to indent the content because the block can end at EOF.


Indentation war is ON again ;-)

TL;DR - Linux kernel source code use TABs (8 characters) instead of spaces. The rationale behind is that the maintainers believe that large indentation makes code easier to read on screen (especially for long hours), makes sense.

Personally I (not a programmer but Linux SysAdmin/Ops/Infra Architect background) tend to use 4 spaces everywhere else (e.g. Shell, Ruby, Java and all sorts of configuration files). Not to pick a fight (sounds familiar? ;-) but 2 spaces in general make readability worse.

Anyway, the most important point is to honour what is already established/in place and stick to it, whatever you work on.


People should just use actual tabs. Then you can make the indentation display however you want in your editor. And you never have to worry about deleting part of an indent level, creating weird slightly offset indentations.

I know, everyone is going to downvote me for saying this. But if you're literally using 8 spaces to represent a tab, which is the default width tabs are rendered at, then what downside could there be to switching to actual tabs?

Edit: "spaces are never used for indentation" so do they actually use tab characters then?


> In all cases, prefer spaces to tabs in source files. People have different preferred indentation levels, and different styles of indentation that they like; this is fine. What isn’t fine is that different editors/viewers expand tabs out to different tab stops. This can cause your code to look completely unreadable, and it is not worth dealing with.

from the LLVM style guide.


What's an example of some source code that becomes unreadable when you change the size of tabs? Are you editing code in Microsoft Word or something? Setting custom tabstops and shooting yourself in the foot?

Don't mix tabs and spaces in indentation btw. That is the one thing you must never do.


Tabular alignment is practically the one thing of all these holy wars that actually improves readability. So no, that's not something funny — it's something that trivially and immediately improves readability of your source.

    foo = 1;
    this_value_is_not_foo = 8;
    another_int = 419;
    pi = 314159;
versus:

    foo                   =      1;
    this_value_is_not_foo =      8;
    another_int           =    419;
    pi                    = 314159;


It's fine to use spaces for your tabular alignment. Go ahead. But whether you have tabs or spaces as the indentation portion won't make any difference to the resulting alignment.

The important thing is that you don't mix tabs and spaces in either portion. So, in my proposal, the indentation portion would be all tabs, and the tabular alignment part would be all spaces.


This works in theory. I've yet to see a project accomplish it in practice. There's always someone with an incorrect editor config that does indentation and alignment with tabs (because it's easier to configure your editor to just send a <TAB> than to have smart behavior, or copy-pasted code contains a tab rather than spaces, or a million other reasons), and nobody notices until two years later and everything is a mess for anyone trying to use a custom tab-depth.


I find the first of those two examples far more readable than the second; the unnecessary spacing requires scanning from the name across the gulf of spaces to the corresponding value, rather than placing the values near the names.

Also, alignment like that produces an excessive amount of diff noise: when you add or remove an entry, every entry changes.


> Also, alignment like that produces an excessive amount of diff noise: when you add or remove an entry, every entry changes.

I absolutely agree that this is a drawback — no approach is without some flaw. This is the flaw in mine. That said, I'm utterly convinced that it is worth this drawback.


Agreed. Nowadays I am a true believer in aligned code, I think it improves readability more than almost anything you can find in modern coding styles.


This gives the false impression that the variables are all related. If they are related, they should probably be encapsulated in some kind of structure. And once you add the comments explaining what the first three variables are and how they should be used, and the comment explaining why pi doesn't have a decimal point, there is no benefit to alignment. If you add another long line, you have to realign the whole list (or leave it looking sloppy). Overall, this kind of alignment is a bad idea.


Congratulations for taking my contrived example way too literally and missing the point entirely.

Not every group of variable assignments warrants a comment for every line, nor is my argument relevant only to static constants, nor is there any reason why aligned assignments should be construed to be related any more than consecutive assignments should be, nor should that stop you from reaping the unarguable improved readability of aligning them anyway.


> This gives the false impression that the variables are all related. If they are related, they should probably be encapsulated in some kind of structure.

The true impression. If they're actually unrelated, they're more likely to be in different files outright, than immediately sequential. "Encapsulating" in a struct is pointless/potentially obfuscatory if they're related local temps - but I'll use the same style for initializing structs.

> And once you add the comments explaining what the first three variables are and how they should be used, and the comment explaining why pi doesn't have a decimal point, there is no benefit to alignment.

I align those comments too.

> If you add another long line, you have to realign the whole list

The one drawback.


Probably tabs because that's what makefiles require.

That and having a standard allows people to speak a common language.


When there are [line] width guidelines then the difference between 2 and 8 adds up


I hate line length guidelines. People should just enable word wrap. I use Sublime Text and it wraps lines in a really readable way, maintaining the indentation from the start of the line.


C Programming vs most other languages is a different animal for readability and formatting, I think you hit the nail on the head when you said "Honour what is already established"

On that subject, I really dislike some of the PHP PSR formatting guidelines, it is a C style language and they have implemented some really painful paradigms


> I really dislike some of the PHP PSR formatting guidelines

What are they? I have found that PHP guidelines are surprisingly good.


Brackets on new lines specifically get my goat, I think it's unnecessary most of the time, there are a few other things that piss me off as well but I guess it's nice to have a standard and I am just one dude who is used to doing it his way :) I have code tidying stuff that happens on commit so I still just do it my way and then let automation handle the pedantry!


PSR-2: http://www.php-fig.org/psr/psr-2/

Of course, these have little to do with core PHP, but PSR-1, PSR-0/4, and large portions of PSR-2 have been widely adopted within the PHP development community. The most-debated portion of PSR-2 is, of course, the use of 4 spaces for indentation (no tabs).

PSR-12 aims to bring the guidelines up to date with the latest features in PHP 7, but I've mostly lost track of things in PHP-land since moving to a new job and burying myself in XSL (and occasionally C#).


Alternative viewpoint from a popular style guide, google uses 2 spaces for C++.


Google's Shell Style Guide says: Indent 2 spaces. No tabs. Use blank lines between blocks to improve readability. Indentation is two spaces. Whatever you do, don't use tabs. For existing files, stay faithful to the existing indentation.

Makes sense. I don't mind stick to established style, personally I'll still prefer 4 instead of 2 for readability (easy on eyes...).


Have to agree with the comment on GNU coding style, I work on a project that uses it and it really sucks. Braces on a newline and indented two spaces...


Offtopic question, what documentation tool is used for this? I've seen it a lot of times but I'm not sure how to search for it. Thanks!


It's made with Sphinx, a tool commonly used for Python documentation. But it can render reStructuredText that isn't associated with Python just as well, so it's popular for other projects. http://sphinx-doc.org/

(There's a link to Sphinx at the bottom of the page)

It looks like this is just generated from Documentation/index.rst and linked pages in the kernel source tree, which seems to have been created less than a year ago: https://github.com/torvalds/linux/commits/master/Documentati...


For background on the docs in the kernel source tree, LWN (as is often the case) had a good article about it: https://lwn.net/Articles/692704/

There's also a copy on kernel. org, not sure why Intel's copy is being linked here. It's probably the same text, so any copy should be equally good, it just feels less canonical to me. Probably silliness on my part for even thinking of it.


Oh, hey, there's a rendered copy at https://www.kernel.org/doc/html/latest/process/coding-style.... , neat! (I knew about the unrendered one at /doc/Documentation)


Sphinx, it's linked to in the footer.


Thanks, didn't see it!


At the bottom of the page:

> Built with Sphinx using a theme provided by Read the Docs.

John Corbert has done quite a bit recently within Documentation to make it more structured and has (from my understanding) massively revamped the documentation compilation.


It seems like sensible advice but with gratuitous insults interspersed, just to make sure you're really clear that the natives are not friendly.


You're right, humor should be illegal.


The writing style of this document seems somewhat casual ("Please at least consider the points made here.") and a little confusing compared to OpenBSD's style guide. [1]

The LibreSSL team spent months putting the OpenSSL code into KNF. It makes a big difference, even though it may seem trivial.

[1] http://man.openbsd.org/style


This is from Documentation/process/coding-style.rst (formerly Documentation/CodingStyle) in the kernel source tree; an unrendered version is canonically at https://www.kernel.org/doc/Documentation/process/coding-styl...


What is K&R? It's referenced several times but not defined.


Kernighan and Ritchie, the C Programming Language. https://en.wikipedia.org/wiki/The_C_Programming_Language


"The other issue that always comes up in C styling is the placement of braces. Unlike the indent size, there are few technical reasons to choose one placement strategy over the other, but the preferred way, as shown to us by the prophets Kernighan and Ritchie, is to put the opening brace last on the line, and put the closing brace first, thusly"


Ah, missed that. Thanks.


>First off, I’d suggest printing out a copy of the GNU coding standards, and NOT read it. Burn them, it’s a great symbolic gesture.

Should I be aware of something from GNU coding standards?


2 spaces for indentation.


> Tabs are 8 characters...

The madman!

/s

Actually, now that I think about it, he has a point. Having a little more horizontal space seems like it would be easier on the eyes.


Huh. Considering they also implement an 80 character limit to line length, that could be a bit restrictive. Perhaps it means you have to refactor into a function once you reach a certain depth of blocks. You can only really go 9 blocks deep.

I hate line length limits personally. Especially because I tend to use long identifiers which eat up most of my line length limit in one go if you try to utter them with their enclosing namespaces.


It's intended to be restrictive. The idea is that more than three levels of indentation should be the exception rather than the rule. Exceeding that without good reason is a hint that functions should be split up.

Having some such rough limit is a good thing, but the number 3 is language dependent. Java for example automatically eats one level for the class.

Also C lacks syntax such as nested functions, try blocks, python-style context managers and all kinds of other stuff which excuse more levels of indentation.


I like gofmt which avoids this problem in golang.


Coding style wars will die with the rise of *fmt.


Okay, let us know when you've written a fmt which works on, say, a mere 70% of kernel code. That should let you ignore the $N different assembler syntaxes at least.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: