Hacker News new | past | comments | ask | show | jobs | submit login
Notes on Programming in C - Rob Pike, February 21, 1989 (cat-v.org)
139 points by AbyCodes on Dec 9, 2011 | hide | past | favorite | 43 comments



One thing I've changed about my style in the last few years is to almost never comment my code, but to instead write long and detailed Git commit messages explaining the how's and why of that code at the time that I write it.

That means that over time I effectively have comments for every line of code in the program, but it's associated metadata instead of being embedded inline, which means that the comments never go out of date, and their history is accurately tracked in version control.


The problem with doing it that way is that someone who comes across a checkout of your code will have no documentation. Think about the case of a distributed tar.bz2 or having a python application installed on your system.

I prefer to have the comments in the code, in a form that can be used to generate documentation on the side (eg. doxygen, epydoc, etc.) so that they can't get lost, and so that you can read them as you read the code. It is difficult to re-combine the git commit messages with the code, but not hard to extract the comments from the code to create documentation.


So anyone looking at your code and doesn't understand why you implemented Foo the way you did has to dig through 500 commit messages to find when it was added. Or worse, you are working with 2 dozen other programmers, all working under their own forks, and they have to look through all of that since they don't know who was responsible for adding the code. Thanks.


Expected response: git log | grep Foo


Or "git blame filename.py" followed by "git show SHA1".


You're likely to cause a lot of trouble for other (possibly new) engineers going through your code. It seems cumbersome to look up metadata side-by-side while going through code.

I, personally, always reread my code in the form of sentences. Something along the lines of: "okay, first I create a x. Then I pass y to it. I then set this flag..."

If at any point I hesitate or have trouble stating what a line of code does, I add these sentences as comments so someone else can understand easily/quickly (if I can't alter the code for some reason, that is).


Inline comments are accurately tracked in version control. I don't see how updating comments when you update the code is any different except for the obvious advantage of seeing the comments right next to the relevant code.


It's really easy to miss updating a comment somewhere. I mean, people introduce bugs into programs even with extensive automated tests. Rendering a comment obsolete is something your tests will never catch, so it's easy to miss the fact that you've broken something until you actually need the comment.

By contrast, it's nigh impossible to forget to include a commit message, since failure to do so will cancel your commit.


I'm on your side. If dates are an issue then have a change-log noted in the source code file where you create a tag (such as initials+date) and insert that above large code changes so you can see code commit changes based on date within code as well.

Comments don't take up much space (reading or on disk) so don't think its bad to be verbose.


You don't actually lose the metadata by commenting the code, thanks to git/svn/hg's blame feature, named identically in each. (I've not used darcs in a very long time, but it had something similar called "annotate".) If the code doesn't look like the comments say it is, you can check the timestamps and if that doesn't help, then you can crawl through old commits to see what's going on. It's useful enough that I've bound a key in my editor to display the blame'd code.


That's sounds interesting. I'm a bit of a git newb; what git commands or tools do you use to make this convenient?


IntelliJ IDEA or any of the JetBrains IDEs based on that platform (RubyMine, WebStorm, etc.) have this built-in: right click in the gutter and choose Annotate. Mouse over a line to see the commit message for the most recent change to that line. http://www.jetbrains.com/mps/whatsnew/screenshots/20/annotat...


"git blame" helps too - yes, you know it was you, but that tells you what commit added that line. Now do "git log -p 31dc4a" (or whatever) to see the commit's comments.


git commit :) Which is required before pushing any code.


Many of these are elaborated on in the more recent (1999) "The Practice of Programming." Excellent book. http://www.amazon.com/Practice-Programming-Brian-W-Kernighan...


The one point that is notably less relevant today is the last section regarding include files. Gcc has a special case for a header file that is entirely wrapped in an #ifndef (http://gcc.gnu.org/onlinedocs/cpp/Once_002dOnly-Headers.html...).


The more modern gcc handling is with "#pragma once":

http://en.wikipedia.org/wiki/Pragma_once

His comments on include files are not the way the world has gone. In all my years writing C, I have never seen a C/C++ header which does not include the other headers it needs in order to compile cleanly.

He states: "Simple rule: include files should never include include files."

If you can find me one C project out there which has headers with zero #includes and forces each compilation unit including the header to include all the prerequisites needed for that header I would be genuinely interested. Even all OS header files include their pre-requisites and it is considered a bug if they do not do so. A more modern 21st century rule would be:

"Simple rule: include file order does not matter. Use include guards (or #pragma once) and always include only the prerequisites needed to cleanly compile and nothing more."


Our experiences don't match. Avoiding nested includes has been a part of the rigorous and useful standards of a lot of important, mission-critical, large-scale projects, and is something I try to enforce in my own code.

There are numerous examples of projects which follow that recommendation. I'm told, for example, that the recommendation on the excellent Blender 3-D graphics system is to avoid nested includes. Other examples:

NASA: http://sunland.gsfc.nasa.gov/info/cstyle.html

European Molecular Biology Open Software Suite (EMBOSS) http://emboss.sourceforge.net/developers/c-standards_new.htm...

Atacama Large Millimeter Array (astronomy): http://www.alma.nrao.edu/development/computing/docs/joint/00...

Clinical Neutron Therapy System (for radiation oncology): http://staff.washington.edu/jon/cnts/impl.pdf


I take your point, but I'm not sure you're really taking his. As another data point, I just did a find/xargs-grep across my "~/codebase/3p" tree, which has ~5500 .h files for projects ranging from MongoDB to Ruby and Python to OpenSSL to valgrind. Virtually all of them have nested includes. Many of them are components of mission-critical software.

He's right; nested includes are a modern C idiom.


> In all my years writing C, I have never seen a C/C++ header which does not include the other headers it needs in order to compile cleanly.

> He's right; nested includes are a modern C idiom.

No doubt they are found a lot. The request was for counterexamples. Fashion/style vs engineering practice assertions may be passing like ships on a nighttime C here.

>...for projects ranging from MongoDB to Ruby and Python...

Python....why, even Tim Peters asserts that "Flat is better than nested"...wink...

http://www.python.org/dev/peps/pep-0020/

/irrelevant quote


To be fair here, NASA does say to avoid nesting headers. But then it follows up with this ridiculous statement:

  In extreme cases, where a large number of header files are 
  to be included in several different source files, it is 
  acceptable to put all common #includes in one include file. 
I would not trust anything else the author of that guide writes. EMBOSS says not to nest. The Atacama code suggests not nesting but then says:

  To avoid nested includes, use #define wrappers as follows:
The last pdf states:

  This prevents including the same item more than once when 
  a .c file includes several .h files (we rejected
  alternatives involving conditional compilation as too 
  complicated).
I guess it boils down to preference since the language allows you to do either, I just don't put a lot of stock in these style guides.


> If you can find me one C project out there which has headers with zero #includes and forces each compilation unit including the header to include all the prerequisites needed for that header I would be genuinely interested.

A whole OS does this: Plan 9 from Bell Labs :)

For details see the paper by Rob Pike: How to Use the Plan 9 C Compiler

http://doc.cat-v.org/plan_9/4th_edition/papers/comp

Also the Plan 9 libraries are much cleaner and leaner than those in 'modern' *nix systems, which makes keeping track of includes much easier: http://man.cat-v.org/plan_9/2/intro


Well it stands to reason that the Plan 9 code would work that way if Rob Pike made the comment. I'm just wondering if anyone else found this way of coding attractive. Everything I've seen points to "no".


There is even a relatively popular piece of software dedicated to helping developers use this style: http://code.google.com/p/include-what-you-use/


That is not the stated purpose of the tool (to write code in the aforementioned style):

  "Include what you use" means this: for every symbol (type,  
  function variable, or macro) that you use in foo.cc, either 
  foo.cc or foo.h should #include a .h file that exports the 
  declaration of that symbol.
I agree 100% with that statement. According to the style, foo.h can never #include anything.


That's not true - you can't forward-declare types used in class member variables for instance.


Dennis Ritchie, creator of C? Ken Thompson, creator of Unix?


The more modern gcc handling is with "#pragma once":

I've considered switching to this, but in the interest of maximum hypothetical portability, I've decided to stick with the #ifndef include guard. I've just checked the C1X standard draft I have, and #pragma once is still not in the standard (6.10.6.1 in the version I have).

As for your other points, I agree with the notion that a header file should include its dependencies, but no more. I suppose my thinking on this is influenced by the dependency tracking of Linux package managers such as apt.


We use both internally, actually. GNU gcc/g++ and IBM xlc/xlC all support doing this:

  #if defined(... all the compilers above ...)
  # define PRAGMA_ONCE _Pragma("once")
  #else
  # define PRAGMA_ONCE
  #endif
Specifically, MSVC and HP cc/aCC do not support it. Oracle cc/CC does the #ifndef/#define detection and internally does something similar to what the pragma enables in other compilers.


Yep, which states

>   The result is often thousands of needless lines of code passing through the lexical analyzer, which is (in good compilers) the most expensive phase.

If only...


Is he saying that the ifndef to protect against double/multiple includes should be in the file you don't want to include multiple times?

The way that part is worded seems a bit confusing.


I think he's saying this:

  #ifndef some_header
  #include some_header
  #endif


These style guidelines correspond exactly to idiomatic Go. I'm impressed how well Pike executed on his philosophy.


Pointers are sharp tools, and like any such tool, used well they can be delightfully productive, but used badly they can do great damage

Nice physical metaphor. Pointers are ... pointy.


Even though this is over 20 years old, many comments are still very valid. I especially like the section about short variable names and functions.


I can see how loop indexes (i) etc are nice to be short, but I don't see how the other stuff is valid, as opposed to a personal preference.

Why is maxval a better name than MaximumValueUntilOverflow?

The first lacks some extra information that I need to keep in mind every time I re-read that part of the code.

And while the potential of overflow might be obvious, how about:

minValueForTemperature instead of minval?


My variables are a little more verbose than the anemic style preferred by Rob Pike and used throughout K&R but, in that tradition, I also tend towards shorter, simpler variable names than something like minValueForTemperature, so maybe it would be useful to illuminate my own thinking.

Complex variable names ought to be avoided because, simply, they hammer the programmer with a bunch of information every time they are used. Usually, when reading code, you're trying to wrap your head around how a procedure operates rather than the specifics of what it's operating on. Often, if you need to know more about what a variable represents, it's sufficient to refer to its declaration.

Thusly, I prefer to name my variables so that one can pick up the general idea of what they're for from the name and then I document any additional information at the variable's declaration, either using a comment or via the type system.

So, this is how I'd handle your examples:

    int maxval; /* until overflow */
    Temperature minval;


If you're dealing with temperatures, a better name than minval or minValueForTemperature might be mintemp, min_temp, or minTemp depending on your style. One reason to use moderately short variable names is so that reading the name is faster. Include the most important information in the name, infer the rest from context. If you find you still need a lot of context, drop something less important from the name (e.g. minval->mintemp).

I'd also avoid putting prepositions into variable names, but that's just a personal preference for keeping things to one adjective plus one noun where possible (less parsing overhead for my brain).


Keeping variable names short is a natural brake on routine length. Like wrapping at 80 columns and using tabs is a natural brake on deeply nested conditionals.

If I need a longer variable name to clarify the intent of the code, it's a signal from the code that maybe I should break the routine into smaller pieces.

I also find that smaller variable names makes it way faster to scan and thus understand.

YMMV ...


Those notes found their way to 'The Practice of Programming' book.

I highly recommend it.


Some absolutely fascinating points. However, I must admit that the portion on link pointers was, frankly, damned frightening as someone who could possibly have to come back and figure out what's really going on.

I'm reminded of Perl when I look at that link pointer code. It's succinct, easy to write, works great most of the time, and nearly impossible to go back and decode later. You simply require too much context to find the one place where you stepped beyond the array and into no-man's land.

I love pointers; I think that like other sharp tools, they have their uses. However, like other sharp tools, they require training and attention to use properly. Lots of people lose fingers to saws every year; how many brain cells have you lost to debugging pointer mistakes?


That's idiomatic C code. You're right that it's not particularly readable if you aren't a C programmer, but if you came to a job interview and couldn't grok it immediately, I'd write "facility with C: marginal" in my notes.


Missed this back closer to when you posted it - but it's less that I don't understand what's going on, and more that this code is more likely to go wrong & be harder to debug than more descriptive code.

Just the other day, I was trying to identify what was being done by someone's "idiomatic" code, and had to keep three vertical screen's worth of data up just to follow what was happening. All that code was wrong, but because they were using pointers "idiomatically", it took a long time to find their off-by-one error.

Granted, they had gone about 4 levels further with pointers than the given example in the article, but the premise is similar.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: