Deconstructing "K&R C"

tba · on Jan 10, 2012

I'm confused. What is the "defect" in K&R's "copy(char to[], char from[])" function?

The author notes that "the second this function is called...without a trailing '\0' character, then you'll hit difficult to debug errors", but no function with that signature could possibly work in this case.

The built-in "strcpy" function has the exact same limitation. Does the author have a problem with it as well? Null-termination is a fundamental concept of C strings; there's no reason to shield C students from it.

The other example of "bugs and bad style" in this "destruction" of K&R C is a minor complaint about not using an optional set of braces.

I hope the remainder of the [incomplete] chapter demonstrates some actual bugs in the book's code, because it currently doesn't live up to the first paragraph's bluster.

angersock · on Jan 10, 2012

It's a nice strawman, right? Especially when he points out that the original code, in context, is perfectly fine. His later complaint about the assignment-in-if statement is certainly something shared by modern C programmers (see compiler warnings about same), but it perfectly fits the original style and accomplishes its task.

His criticisms seem to be rooted so far in stylistic issues and in taking the code out of context (design context, usage guarantees, etc.). Then again, how are you to nerdbait while still being fair to original sources?

zedshaw · on Jan 11, 2012

My criticisms are only partially stylistic, but also that people copy this code to other situations and it breaks. So, I'm showing them why it only works in that one specific context and how to break the code.

angersock · on Jan 11, 2012

If I can make a suggestion, then. I apologize if you've already covered this elsewhere in your book, but:

Please encourage the use of the "restrict" keyword to encourage proper aliasing optimization by compilers.

Also, would it be worth considering, perhaps in a chapter on how-to-deal-with-C-style-strings-if-you-must, including a section on writing compiler-independent macros to emulate safer calls like the str*_s variants in VS. These functions are usually slightly-different in incantation across VS and GCC, so it might be helpful to abstract them. It also is a relatively simple exercise in preprocessor macros with a clear benefit.

mechanical_fish · on Jan 10, 2012

The built-in "strcpy" function has the exact same limitation. Does the author have a problem with it as well?

Yes. From the linked chapter:

we avoided classic style C strings in this book

From an earlier chapter on strings:

The source of almost all bugs in C come from forgetting to have enough space, or forgetting to put a '\0' at the end of a string. In fact it's so common and hard to get right that the majority of good C code just doesn't use C style strings. In later exercises we'll actually learn how to avoid C strings completely.

This is the author's opinion, of course – it's from a book, that should go without saying – but it's not as if the idea of avoiding C strings in general, and "strcpy" in particular, is an oddball or unique point of view. See e.g.:

http://stackoverflow.com/questions/610238/c-strcpy-evil

peapicker · on Jan 10, 2012

"the majority of good C code just doesn't use C style strings..."

Nice. In the 23 years I have worked on C language products, I've never worked on "good C code" by this definition.

The cool thing about this guys book, I guess, is that by avoiding all the things about the language he doesn't like, any reader will be wholly unprepared for C in the Real World after this book.

mechanical_fish · on Jan 10, 2012

This chapter is explicitly teaching people what the author's idea of bad C code is: It has them read some, then tells them specifically what the gotchas are, then asks them to code up test cases for the flaws and run them through Valgrind.

Where's the "avoiding" here?

And which of the skills being exercised - imagining what kinds of bad things could happen, writing executable test cases, detecting segfaults - are not useful in the real world?

zedshaw · on Jan 11, 2012

That's basically the point of this chapter. It's getting people to think like a hacker and try to break the code in unintended ways. That makes them better programmers and helps when avoiding common mistakes in C code.

Using K&R to do this is to give people a set of known good C code samples and show how even those can be broken and misused.

ori_b · on Jan 11, 2012

Noble goal, but the way the chapter is written doesn't really come across that way, in my opinion.

aptwebapps · on Jan 11, 2012

Speaking as someone who doesn't know C, hasn't read K&R, but who (usually!) has decent reading comprehension, it _does_ come across this way.

He's completely explicit about what he's doing and why. I don't understand how people can read that and still feel that he's being unfair.

Maybe I'm just too literal-minded.

ori_b · on Jan 11, 2012

Calling string manipulation functions buggy and broken because they fail when you don't pass them a string is both wrong and silly.

bonzoesc · on Jan 11, 2012

How do you test that something in C is a string?

tedunangst · on Jan 11, 2012

You don't, because you can't. That's basically the same as the beginner question, "How can you test if a pointer has been freed?"

bonzoesc · on Jan 11, 2012

It was a rhetorical question. What's the sense in having functions that operate on "strings" if you can't figure out what a "string" is at runtime? It's much saner to have functions operate on "strings that are 80 characters or less" or "a structure containing a integer `length` and an array of `length` chars."

ori_b · on Jan 12, 2012

What's the sense in having functions that operate on "pointers to valid memory" if you can't figure out if it's "valid memory" at runtime?

The point is that the function is not buggy. You may not like the specification for it. You may think it should be designed differently. This is not the same as the code being buggy.

ori_b · on Jan 12, 2012

The same way you test that a pointer is intialized to a valid value. You don't.

marshray · on Jan 11, 2012

How about being able to work on a team of other programmers without insisting that everything ought to be done you own, completely unique, way?

If he truly had a better, more correct, way to write C code I would join him in trying to change the world. But the first example he gives is just factually wrong.

tedunangst · on Jan 11, 2012

"we avoided classic style C strings in this book"

There's the avoiding. The skills being exercised are not a problem, it's the skill not being exercised that is a problem. You cannot be a competent C programmer without understanding strings and their relevant library functions.

zedshaw · on Jan 11, 2012

Yes, they'll be pretty good at C unlike you because they'll read the whole book instead of just this one half-finished chapter.

peapicker · on Jan 11, 2012

"they'll be pretty good at C unlike you"

Thanks for the code review of my last 23 years of work on C projects. Nice ego that your book must be the only way to learn this.

I mostly disagree with your statement about "most good C code" not using stdlib strings/apis.

To be fair, from your comments here, it looks like you aren't really saying that -- you are saying we have to be careful when using them. This I know, and carefully guard against, and I do try to break my code regularly. But to assert that "most good C code" doesn't use them fails to meet with my experience.

That said, _all_ bad C code (ab)uses them, probably due to carelessness and/or misunderstanding of how things work in C.

listic · on Jan 10, 2012

The approach looks similar to the one in "JavaScript: the good parts" I am worried about the same while reading it: ok, here's the right way to do e.g. inheritance, but how do they do it in real world?

perfunctory · on Jan 11, 2012

> unprepared for C in the Real World

Oh please. Most C in the "Real World" is utter crap. It doesn't mean there is no place for books teaching "alternative" approaches.

shabble · on Jan 11, 2012

I didn't see it yet from a quick skim of the available chapters, but I'm going to guess that his preferred alternative is 'bstr - The Better String Library' - http://bstring.sourceforge.net/

I've used it the past, and I try to include it in any project that isn't immediately trivial, although the to/from integration with just about all library code ever is a bit of a hassle.

tptacek · on Jan 11, 2012

It's not just that the to/from is a hassle, it's also that you have to deeply understand the gotchas of ASCIIZ strings to safely do that bridging at all --- so pretending it doesn't exist is probably not a good teaching strategy. (I have no opinion about the article and am only commenting on this thread because I like C).

zedshaw · on Jan 11, 2012

I do not pretend anything about null-terminated strings in the book. In fact, I have many exercises and assignments where they use Valgrind to buffer overflow C strings repeatedly and don't introduce bstring until much later.

tptacek · on Jan 11, 2012

Sorry, I haven't read your book, meant to be clear that I was talking in the abstract, but I can see after rereading I wasn't clear enough.

zedshaw · on Jan 11, 2012

I use bstring but there's a few others. Generally it's the easiest way to avoid some common buffer overflows and other problems.

zedshaw · on Jan 11, 2012

> but no function with that signature could possibly work in this case.

This is the source of the bugs in C. People write functions that only work given all calls to them are never changed, which is absurd. Good modern C code involves trying to protect against bad usage and adding defensive checks.

So yes, the built-in strcpy is crap which is why most competent C doesn't use it except in a few rare cases where it's required.

And this does demonstrate actual bugs in the code. I wrote a test case that causes it, which incidentally is a common bug in C code called a buffer overflow. It's because of code examples like this that get copied to other situations that we have these defects.

tptacek · on Jan 11, 2012

High level string libraries are a win.

But you may be overstating your case a bit.

From my codebase/third-party directory on my laptop (a bit random, I admit), from those projects I'd consider "competent C" (ie, not OpenSSL or MRI ruby):

* dovecot uses ASCIIZ strings and libc string functions

* redis uses ASCIIZ strings and libc string functions

* nginx uses a high-level buffered string library

* lcamtuf's skipfish scanner uses ASCIIZ strings and libc string functions

* libevent uses ASCIIZ strings and libc string functions

* qmail uses djb's string library

* memcached uses ASCIIZ strings and libc string functions

It's probably good to be comfortable with both approaches.

I don't know that you actually made this claim, but you seem to have given people here the impression that you believe functions that work with ASCIIZ strings should be bulletproofed to handle non-ASCIIZ inputs. I couldn't agree with that argument, especially as an argument about K&R's code being rusty.

People here are jumpy though (they're commenting, like me, mostly because they're bored).

Looking forward to more examples from the book.

aaronblohowiak · on Jan 11, 2012

Redis strings are actually length prefixed null terminated strings

tptacek · on Jan 11, 2012

The string data type in the db? Because Redis itself uses char-stars all over the place, with the libc string functions.

aaronblohowiak · on Jan 11, 2012

Hmm, reading the source it looks like he is using His sds string library, which has Len, size and a asciiz char* member. When last I checked he does pass the char* around (because it is null terminated) but he also sometimes will do pointer math to get back to the sds

tptacek · on Jan 11, 2012

You're right; I was reacting to the count of char\s+star and snprintf calls, but only fair to chalk Redis up among the packages I have that rely on a high-level string library.

gonzo · on Jan 11, 2012

this from the man who won't describe xargs to newbies because he doesn't understand it.

blargh

geophile · on Jan 10, 2012

Huh? My C programming is in the distant past, so I might be forgetting. But strcpy does assume a terminal zero, doesn't it? E.g., http://linux.die.net/man/3/strcpy. It sounds like you are talking about strncpy or memcpy.

zedshaw · on Jan 11, 2012

Yes, strcpy is almost universally understood as a security problem. In fact, here's a nice regex to apply to some C code to find buffer overflows:

LC_ALL=C egrep '[^_.>a-zA-Z0-9](str(n?cpy|n?cat|xfrm|n?dup|str|pbrk|tok|_)|stpn?cpy|r?index[^.]|a?sn?printf|byte_)' src/*.c

Taken from the really well researched and secure andhttpd:

http://www.and.org/and-httpd/#secure

Run that regex on some C code, then go look at how the inputs to those functions are used, and then you can probably create some of your own buffer overflows. It's like magic.

blix · on Jan 10, 2012

I'm confused by your comment. strcpy assumes that the string to be copied ends with a NUL. The case described in the link violated that assumption and caused a segfault.

marshray · on Jan 11, 2012

To be pedantic, it's not that strcpy is making assumptions.

Its interface is defined such that it is explicitly invalid to pass it some other garbage.

zedshaw · on Jan 11, 2012

To be both pedantic and correct, there is no way to define a C function to restrict a string input so that it is correctly terminated. So no, it's at best documented that you shouldn't do that and definitely doesn't prevent you from doing this.

ec429 · on Jan 11, 2012

To be still more pedantic, there are two kinds of definition of an interface: definition-within-the-type-system and definition-within-the-documentation. The string library is specified in the ISO C standard (I use C99, but you can be all hip and C11 if you want), and passing an unterminated string to strcpy is a constraint violation.

7.1.1 A string is a contiguous sequence of characters terminated by and including the first null character.

7.21.2.3.2 The strcpy function copies the string pointed to by s2 (including the terminating null character) into the array pointed to by s1.

Therefore, code which passes an unterminated string to strcpy is not conforming code (because s2 does not point to a "string" as C99 defines it).

Of course, you should use strncpy anyway. But that's not the point.

/me spends too much time in comp.lang.c :S

marshray · on Jan 11, 2012

Right, the C language type system makes no attempt at describing that kind of constraint, yet it is still clearly defined within the language.

And it's not just the standard C library that defines it...this definition of string is also supported by the literal string data type.

X-Istence · on Jan 11, 2012

The other time that there will be an issue is when your to is shorter than your from...

blix · on Jan 10, 2012

I don't see what the big deal is here. He throws inputs at a function not meant to handle them and gets a segfault. Isn't this more-or-less expected from all C functions?

On careful examination of his example code, you can see that there's no attempt to terminate the buffer (though he misidentifies it as an "off by one" error). Throwing arbitrary buffers into functions meant for null-terminate strings will cause errors everywhere, not just in K&R example code. In the next chapter, does he use strcpy in the same example and use that error to say that we should deprecate the entire standard library?

On a more nit-picky note, the 'triple-equality trick' the author derides is a common C idiom and one every C programmer should be familiar with. Perhaps it doesn't belong in the first chapter, but it definitely belongs in any C manual.

zedshaw · on Jan 11, 2012

The inputs I'm throwing at it are a common problem mistake in C programming which demonstrates a buffer overflow. K&R's code assumes that it will only be used in each program they've written, but it usually get copied out into other software and that causes the bugs. As I said, the programs are correct, the functions are not.

And no, most of the clever convoluted code in K&R is antiquated and the only reason it's a common C idiom is because nobody questions whether it's good style. It's more of a habit than an idiom.

jpablo · on Jan 10, 2012

Agreed, the example used seems very strange. Every function makes functions about it's parameters and in C null terminated string are to be assumed unless otherwise noted. The std library strcpy would just behave in an identical way.

Yeah, I'm all about checking your assumptions and defensive programming but k&r is hardly as fault because you throw garbage at a random function and it segfaults.

I find the idea of teaching modern C by pointing problems k&r awesome, but the examples given are not really good.

angersock · on Jan 10, 2012

Yeah, I'm all about checking your assumptions and defensive programming but k&r is hardly as fault because you throw garbage at a random function and it segfaults.

Maybe he should write a series on "Learn Ada the Hard Way"?

beagle3 · on Jan 13, 2012

(apologies for hijacking an unrelated thread, but I have no way of contacting you ...)

During our discussion of the redis/windows release, you mentioned your team is about to release some library code around new-years day. Has that release happened? If so, where can I find it?

Thanks in advnace

--beagle3

angersock · on Jan 16, 2012

Yeah, sure!

Code's up here: www.762studios.com

It's been up about a week or so. (EDIT: we had it up on the first, technically.)

We're pushing out a significant update to that code in the next few days, as well as polishing the site with the bugtracker and things.

One of the big problems we ran into was function-level specialization for certain container classes--storing smart pointers in certain cases would screw up the reference counting. The code update will be fixing that.

If you want to get in touch, shoot me an email at cre1@762studios.com

Happy hacking! :)

beagle3 · on Jan 16, 2012

Thanks!

Looks like a solid, high quality code base from a quick look. Keep up the good work!

zedshaw · on Jan 11, 2012

You should go read about fuzzing. It's amazing what throwing "random garbage" at programs can break.

jpablo · on Jan 11, 2012

Passing random garbage to a random function is not the same as passing random input to a program.

Take for example one of the most common bugs in C (even more common than not null terminating strings I'll say) passing already freed memory, are you going to argue that a function has a bug because it'll randomly give root access to outsiders because someone took it out of context and passed a stray pointer to it.

Calling a function that expects to get a null terminated string buggy because you passed a pascal string and it crashes seems weird at least. I'll call it potentially dangerous when misused (like most of C code anyway), but not buggy, since it's behaved as designed.

C is an unsafe language and yes, you can validate a lot of things, but at the end of the day it's still a unsafe language, the solution in my opinion is to advocate the use of safer languages and only use C for when it's strictly necessary.

sliverstorm · on Jan 10, 2012

I don't see what the big deal is here. He throws inputs at a function not meant to handle them and gets a segfault. Isn't this more-or-less expected from all C functions?

We are getting separated from that kind of behavior. Nowadays the language is generally expected to protect you from yourself.

marshray · on Jan 10, 2012

No, C is specifically for situations where you don't want the language to "protect you from yourself". E.g. systems programming where little or no runtime performance will be sacrificed to enable imperfect code to be safe.

sliverstorm · on Jan 11, 2012

Understood, it just catches people off guard and they get mad at C

adavies42 · on Jan 10, 2012

> Braces Are Free, Use Them

I disagree. Braces have a cognitive load, particularly if they're given whole lines to themselves. Which is easier to read,

  while ((len = getline(line, MAXLINE)) > 0)
      if (len > max) {
          max = len;
          copy(longest, line);
      }
  if (max > 0) /* there was a line */
      printf("%s", longest);

or

  while ((len = getline(line, MAXLINE)) > 0)
  {
      if (len > max)
      {
          max = len;
          copy(longest, line);
      }
  }
  if (max > 0) /* there was a line */
  {
      printf("%s", longest);
  }

? Personally, I'd probably write it

  while ((len = getline(line, MAXLINE)) > 0)
      if (len > max) {
          max = len;
          copy(longest, line);
      }
  if (max > 0) printf("%s", longest);

. Consistent structural indenting makes it easy to see the boundaries of the control structures.

scott_s · on Jan 10, 2012

Because it's the style I use, and because I don't need to bug-check the indented code to make sure it's braced correctly, I find this easiest to read:

  while ((len = getline(line, MAXLINE)) > 0) {
      if (len > max) {
          max = len;
          copy(longest, line);
      }
  }
  if (max > 0) { /* there was a line */
      printf("%s", longest);
  }

silon3 · on Jan 11, 2012

By far.

ufo · on Jan 10, 2012

We are talking abou K&R here. Please use the less spacious TrueBraceStyle instead of the heretic Allman style. :)

Joke aside, the problem I see in your argument is that lack of braces is also a cognitive load. If braces are always there I can safely ignore them but if they can be ommited I need to worry if a certain piece if code is braced or not when I read and edit it.

The only exception I currently live with is single line if statements, as long as the statement is on the same line as the `if`

SeanLuke · on Jan 10, 2012

I'm probably the last person in the world to prefer:

      while ((len = getline(line, MAXLINE)) > 0)
          if (len > max)
              {
              max = len;
              copy(longest, line);
              }
    
      if (max > 0) /* there was a line */
          printf("%s", longest);

I fear this may be literally true: this brace style is called Whitesmith's, and I was reporting bugs to the cc-mode indenter for emacs a while ago. BSD/Allman and K&R styles never made sense to me.

zedshaw · on Jan 11, 2012

Add a new statement to the while-loop. Now you've figured out why that style isn't so hot. You now have to remember to go add the new braces, and typically people are mentally putting them in so they forget. Same with the last if-statement. You'll go, add the new line, indent and go "cool done". Then scratch your head for hours wondering why it's not working.

SeanLuke · on Jan 11, 2012

Oh, well there are two issues here: whether to always include braces, and what brace style to use. Whitesmiths is a brace style, that's what I was pointing out. The decision to include braces or not is a different thing. I tend to omit braces in obvious situations, but am not beholden to it.

Roboprog · on Jan 11, 2012

Exactly. Fully bracketed syntax prevents this kind of maintenance breakage.

Weird contrast: Perl, for all its klugy syntax, requires fully bracketed syntax on the statement bodies of "if", "while" & the like, so this never happens. Of course, you get other readability issues there...

Roboprog · on Jan 11, 2012

Steve McConnell makes a pretty good case for Whitesmith's formatting in this book (at least the first edition) in the code formatting chapter:

http://books.google.com/books/about/Code_complete.html?id=Qn...

Of course, he presents most of the major styles, pretending to be "fair and balanced", but if you read between the lines of the "abstract block" argument, it's clear that the other styles which align keywords or grouping symbols at a different level other than the compound statements they delimit are brain damaged, or at least visually misleading.

glitch · on Jan 11, 2012

SeanLuke, that is exactly how I'd write it too! ✼Whitesmith fistbump✼ — Related: I love the Wikipedia talk section on it: http://en.wikipedia.org/wiki/Talk:Indent_style#Whitesmiths_s...

NickPollard · on Jan 11, 2012

I'm currently experimenting with a hybrid of this and K&R style, that would look like this:

  while ((len = getline(line, MAXLINE)) > 0)
      if (len > max) {
          max = len;
          copy(longest, line);
          }
    
      if (max > 0) /* there was a line */
          printf("%s", longest);

So far I'm finding that it gives good readability, doesn't waste too many lines on braces, and doesn't run into many issues of accidentally leaving braces out.

I started it after having started writing lisp and python, and I'm getting used to just ignoring the braces and going by indentation, which this style emphasises.

cbs · on Jan 10, 2012

Easiest to read? The middle. The way I would write it for my own use is closest to the top, but the easiest to read is the middle.

Always using braces, and giving them their own lines makes nesting as unambiguous as possible and the extra white space reduces density making foreign code easier to digest.

zedshaw · on Jan 11, 2012

> Braces have a cognitive load, particularly if they're given whole lines to themselves. Which is easier to read,

Checking braces has a higher load when they aren't there and when they aren't consistently used. But, without evidence both of our points are pointless. So, in my book I'm pushing the standards of always use braces except in a few little cases.

mindcrime · on Jan 11, 2012

I'll take the 2nd one, please. And twice on Sunday.

nessus42 · on Jan 11, 2012

All this talk about where the right place to put braces is, makes me pine for Python.

Teckla · on Jan 11, 2012

When it comes to assignment and comparison in constructs such as while loops, I think this variation adds readability:

while (len = getline(line, MAXLINE), len > 0) ...

I've never seen anyone else use it, though.

lurker14 · on Jan 10, 2012

The middle version uses braces nicely to add whitespace to the function to help the reader flow through the code, instead of chasing matched-parens and wondering if each line is a continuation of the previous.

askmeaboutloom · on Jan 10, 2012

I don't see his point, really. Yes, C doesn't have a statically testable string type. And yes, the convention is that a C "string" is just an array of character data with a trailing NUL-Byte. He constructs an array without a trailing NUL-Byte - so that's not a string, but an array of characters.

The fact that copy() now happily runs through memory is the expected result of the bug in the calling code. No, it's not useful. Yes, there are problems with the whole approach of using a terminating value - but this doesn't seem to be his point (otherwise he would also have mentioned the linear runtime complexity of strlen() and the problems that arise when a string itself contains a NUL-byte, I suppose).

Now, what is his point? This is chapter 55 [!] in a book called "Learn C The Hard Way" and the author complains about well-know problems with the standard-lib string convention, optional braces, and the common C idiom of doing assignment and value-testing at the same time, _and_ calls all of this 'Deconstructing "K&R C"'?

Maybe a "What I personally don't like about C" would have been a better title. The K&R examples are flawless. The language and stdlib are not. That's well-known. What is new?

Turing_Machine · on Jan 11, 2012

"Yes, there are problems with the whole approach of using a terminating value" And, of course, there are problems with the other approaches too. Using a single length byte at the beginning limits you to 255 characters and makes pointer manipulation less straightforward (since the actual data starts one byte further on). Using more than one byte for the string length wastes space (a serious issue at the time C was developed, and even today in the microcontroller world). The author's assumption that he's smarter than Kernighan, Ritchie, and Thompson probably isn't the best approach.

By the way, the "old style of "leaving the cleanup to the OS" doesn't work in the modern world the way it did back in the day." works just fine on iOS and many other modern platforms.

tptacek · on Jan 11, 2012

It's way worse than an extra byte, or the offset of 1 byte for pointers; it also means you need a whole copy of every substring with its own length delimiter, and can't tokenize in place.

C code gets into just as much trouble with length-delimited data structures as it does with ASCIIZ; ASCIIZ is a red herring. People have declared over and over again that it's the single worst decision in C and the cause of every buffer overflow. But if you look over the past 10 years or so, memcpy() has caused just as much chaos, and we're just as likely to find an overflow or integer mistake in binary protocols (where NUL-termination is nonsensical) as we are in ASCII protocols.

"Leaving the cleanup to the OS" works everywhere, on every modern system, and lots of programs would benefit from shedding a lot of useless bookkeeping and just treating the process address space as an arena to be torn down all at once. But I think the point the author was trying to make is, when you code that way, you make it impossible to hoist your code into another program as a module. Which is true; if it's likely you're writing a library as well as a program, you don't get to take the easy way out.

You can still write a 100 line arena allocator to pretend like you can, though. :)

zedshaw · on Jan 11, 2012

I partially agree with you, but in a different way. I feel that the real problem is the OS doesn't give code access to its own internal accounting of allocated memory. It already knows the size of any heap chunk you make, so why can't we ask it? In most C code we're carrying around either a null terminator (which can get clobbered) or a whole integer for the size.

Instead, there should be a way to ask the OS "how big is the crap this pointer is pointed at" and get a valid answer. Other useful things would be "how far inside the chunk pointed at by X is the pointer Y?" Or, "will pointing Y at J inside X cause an error?"

And it wouldn't even need to be the OS, just the allocator, probably a few macros, etc. But, for now I have to show people how to write bug resistant C code so this is the best way so far.

tptacek · on Jan 11, 2012

Part of the problem here is that the allocator doesn't need to know how big the crap the pointer is pointing at is; it only needs to know that the crap is smaller thank the chunk it allocated.

If you're going to teach people something unorthodox about C programming, writing custom allocators would probably be a great one. In more than one job I've crushed optimization problems on code written by people way smarter than me simply by knowing to profile allocation and replace malloc.

scott_s · on Jan 11, 2012

Hmm. The question "what is the size of memory that x points to?" is cheap to figure out, because free needs to do it anyway. You couldn't use a macro to do it - it would need to access the internal data structures of the allocator - but it's easy to do. The other questions could be macros that called the first function.

What are the use cases for these functions? What bugs would they prevent?

tptacek · on Jan 11, 2012

Worth pointing out again: the size of the chunk allocated for a particular data structure does not give you the precise bounds of the data structure; odds are, the chunk is slightly larger than the structure.

scott_s · on Jan 11, 2012

I wrote a response explaining why, if you know x then you must know y, but then I realized you were talking about knowing y and learning x. Yes, I agree. I'm not sure which context (knowing the actual size of the memory chunk, which is easy, or knowing the used size of the memory chunk, which is not easy) Zed was talking about.

Roboprog · on Jan 11, 2012

That's a hell of a good idea. As somebody pointed out, it might not stop you from corrupting neighboring items in an array or structure, but it would let you find the size of an array allocated by itself, AND, it would stop you from corrupting the heap itself!

tptacek · on Jan 11, 2012

If you're concerned about corrupting the heap, use an allocator hardened against heap corruption. The default WinAPI allocator, even for optimized production code, is hardened that way. Userland code doesn't need to do anything to get the feature, which is as it should be, because people who write userland code don't know enough to defend the heap against memory corruption.

Roboprog · on Jan 11, 2012

I would happily trade C ASCIIZ strings for Pascal/Perl/Java out of band length indicated strings, even at the cost of those edge cases. Especially if there were a way to internalize immutable string data, and share the bytes of common fragments. (this of course doesn't work well if you plan on modifying the string data)

tptacek · on Jan 11, 2012

So make the trade. I'm sorry, I can see I'm communicating some kind of disdain for alternate string representations, but every C programmer I know --- every single one of them --- has used some form of counted string at some point.

I'm just saying there's a reason the default in C is ASCIIZ. Most of what you do with strings is lightweight; compare 'em, search 'em, tokenize 'em, copy 'em. For that 80% of use cases, ASCIIZ is superior.

Should ANSI C libc provide a heavyweight counted string alternative? Sure, I think so; in fact, it's possible that the only reason it doesn't is that it would take 300 years to resolve all the disputes about exactly what such a library should like like, since every professional C programmer has their own now.

Roboprog · on Jan 11, 2012

Fair enough. Started something like that, not sure I'll ever finish it, though :-(

https://github.com/roboprog/buzzard/blob/master/bzrt/src/bzr...

zedshaw · on Jan 11, 2012

> I don't see his point, really.

The point is three fold:

1. K&R has defects in it when the functions in it are used out of context because they didn't include defensive programming practices considered standard today.

2. People can learn a lot about writing good code by critiquing other code, even from masters, so I'm taking them through doing that.

3. There should be no sacred cows, and people hold K&R on a pedestal without questioning what's in it. This is really what causes #1, so I have them do #2.

That's it really.

askmeaboutloom · on Jan 12, 2012

Noble goals. And now that you said that I don't understand how I could have overlooked that in the first place.

Maybe it would be a good idea to put a bit more stress on the "no" in "no sacred cows", that's an important point beyond just K&R. Nothing should be sacred, including "Learning C The Hard Way" - and a "question everything and everyone" mindset is generally a good thing to have as a programmer [I think I read that in "The Pragmatic Programmer" ;-)]. But other than that I now think that chapter is fine. I do apologize for the somewhat passive-aggressive form of my question.

Thanks a lot!

michaelfeathers · on Jan 11, 2012

I'd rather see a book on writing C code the Zed way.

mhartl · on Jan 10, 2012

It's worth noting that this isn't a standalone article; it's lesson 55 in "Learn C the Hard Way". In other words, this isn't meant to be a thorough critique. It's just one lesson of many.

cbs · on Jan 10, 2012

Guys, no need to jump all over this, clean code and defensive programming are solid practices to advocate. "That's just the way we do C" isn't really a good defense to someone trying to point out the warts in the way we do C, even if it did come from K&R.

Its easy to point to this for not sufficiently proving its argument, but its a work in progress, you're not the target audience, and finally, some of the shit in K&R seriously would not pass a code review here, and probably wouldn't where you work either.

MattyDub · on Jan 10, 2012

I think people are misunderstanding this; it's pedagogical. From the link:

'I want to use it as an exercise for you in finding hacks, attacks, defects, and bugs by going through "K&R C" to break all the code...When you are done doing this, you will have a finely honed eye for defect.'

He's trying to train his readers to think about places code can break.

JeremyBanks · on Jan 10, 2012

This is just the first section of the chapter, right? It seems a little premature to be distributing or discussing it here.

calibraxis · on Jan 10, 2012

Good point. But that's the most inspiring approach I can remember ever seeing in such a book. If only all of my technical books were written with such a mindset.

zedshaw · on Jan 11, 2012

Thanks, I do like breaking things.

tylerc230 · on Jan 10, 2012

"Before the switch to heap memory, this program probably ran just fine because the stack allocated memory will probably have a '\0' character at the end on accident."

Stack memory is not preinitialized to '0' either so using stack memory would show this 'defect' as well (as others have pointed out this is a defect in the calling code, not in copy).

Roboprog · on Jan 11, 2012

Even if the stack was 0 filled at startup, it won't be after you call something else, and that other code returns. The stack will simply be loaded with the final values of auto vars from the called routines, as well as return addresses and other such stack frame tracking.

apu · on Jan 11, 2012

I think that depends on compiler, although it's been a while since I've done C, so I might be wrong.

zedshaw · on Jan 11, 2012

It's fairly vague in the standard, but most of the time this is what happens so nobody hits the bug. Also, many OS are notorious for being loose with the stack and strict on the heap for various reasons.

Roboprog · on Jan 11, 2012

To be honest, I was never that impressed with the K&R book as an introduction. Coming from a higher level language (Pascal) in college after several years back down to C was painful. K&R (borrowed from a classmate) presumed too much obviousness in what assembly language constructs were being abbreviated by a given piece of C code, and what the other pieces of C code around the example to be explained did.

Later in the semester I picked up a copy of the first edition of this book:

http://books.google.com/books/about/C_an_advanced_introducti...

This book did a much better job of explaining what the hell was going on, particularly in regards to managing memory / strings / arrays / (C) pointers and understanding their placement.

Sure, Pascal had pointers, when you asked for them :-)

Yeah, throw the K&R book at newbs you hate.

jemfinch · on Jan 10, 2012

Isn't "assert(line != NULL && longest != NULL && "memory error");" a bug in Zed's example code?

tba · on Jan 10, 2012

No? It ensures that malloc didn't return a NULL pointer and the '&& "memory error"' is a common pattern to add a comment describing why an assert() statement failed.

kisielk · on Jan 10, 2012

One problem with it is that on most compilers if you compile with optimizations it will remove the assertion. Now the code is no longer guarded against malloc failures and will just segfault. If the intent is to teach people how to handle malloc failures gracefully, it's not that great of an example.

tptacek · on Jan 10, 2012

Since virtually no production C code† can actually handle the case of a random malloc() call failing (just like with exceptions in C++, the code to reliably unwind an allocation failure depends on exactly where you're at in your allocation pattern), the simplest and most reliable way of handling malloc() returning NULL is to rig the program to abort.

You're right that assert() isn't the most reliable way to do this (programs have to work whether or not assert() is a no-op; that's the point of assert).

On most platforms, you can just rig malloc to do the abort itself instead of failing --- either with configuration or by preloading a wrapper malloc. Some very, very large shops do exactly this.

Another very common idiom is "xmalloc", which does the malloc/if/abort dance. But xmalloc() misses every place where libraries call malloc(); the most obvious example is strdup(), but the more pernicious issue is 3rd party code that can't know to use x-whatever().

Calling malloc and then immediately assert()'ing success is a reasonable shorthand. That exact same code can be made safe on any mainstream platform just by changing malloc's configuration.

† On general-purpose platforms; I know things are more complicated on embedded platforms.

kisielk · on Jan 11, 2012

Good point about the 3rd party libraries there, it's true that even if you do your own cleanup after manual mallocs, you can't do much about theirs.

Do you have an example of documentation that shows how to configure malloc to do some cleanup in the case of failure? Based on the Linux manpages I can't really see any obvious way to do this short of preloading some kind of malloc wrapper that's custom-written for the application.

tptacek · on Jan 11, 2012

I think it may depend on the distro (on Ubuntu, for instance, you can set MALLOC_CHECK_ to 2 to force aborts on errors --- but this also forces the use of a debugging malloc, and you should care about malloc performance).

So what I'd recommend is, grab Jason Evans' jemalloc package and use that (it's good enough for Facebook!).

CJefferson · on Jan 10, 2012

Why do you think the compiler with optimizations will remove the assertion?

The compiler will realise that:

    assert(assert(line != NULL && longest != NULL && "memory error");

Is equivalent to:

    assert(line != NULL && longest != NULL);

But there doesn't seem anything wrong with that.

tptacek · on Jan 10, 2012

Production code is supposed to define away the assertions. It is an actual error to manage important program state with asserts.

(That doesn't make these particular asserts an error).

zedshaw · on Jan 11, 2012

In the earlier chapters they setup a Makefile with full debugging on so these don't get removed. It's mostly just for this one exercise, and in the rest of the book they use a set of debug macros I wrote:

http://c.learncodethehardway.org/book/learn-c-the-hard-waych...

scott_s · on Jan 10, 2012

In support of tptacek's comment, an abbreviate version of what's in assert.h on my OSX system:

  #ifdef NDEBUG
  #define assert(e)   ((void)0)
  #else

  #define assert(e)  \
    ((void) ((e) ? 0 : __assert (#e, __FILE__, __LINE__)))
  #define __assert(e, file, line) \
    ((void)printf ("%s:%u: failed assertion `%s'\n", file, line, e), abort())

So, if NDEBUG ("no debug") is defined, which it is if optimizations are turned on, asserts become a no-op.

chc · on Jan 10, 2012

I think you're mentally substituting "&&" with "||", in which case the assert would never fail because the string "memory error" is never 0. But with &&, the assert will fail when it should and the "memory error" is just a superfluous comment, not a bug.

jemfinch · on Jan 11, 2012

That's absolutely correct, thanks for pointing that out.

richardk · on Jan 11, 2012

Myself, I learned from K&R, and I thought it was great at the time, now I'm less sure of this.

I've heard that "A Book on C" is supposed to be a much better book for learning C

perfunctory · on Jan 11, 2012

Not only the book "needs to be taken down from its pedestal" but it's time for the C language itself to go. It was fine language at the time (better than assembly I guess). No more. Not in the 21st century. The most common excuse for sticking with C is efficiency - I pity those who still believe so. Another excuse is extended set of libraries - http://go-lang.cat-v.org/pure-go-libs, when Go was publicly released?

That is, stop writing books about C and start creating new languages.

perfunctory · on Jan 11, 2012

A lot of C lovers I see.

16s · on Jan 10, 2012

The greatness of the book is its terseness. Some of the code examples may be dated and buggy, but that does not diminish it at all.

bonzoesc · on Jan 11, 2012

Dated, buggy, and un-stylish examples are exactly what diminishes it.

tptacek · on Jan 11, 2012

And those dated, buggy, and unstylish examples would be?

bonzoesc · on Jan 11, 2012

I dug out my 29th printing copy for this :)

Dated: the "Hello world" on page 6 produces warnings when compiled "> gcc -ansi -o hello hello.c -Wall" c.f. http://c-faq.com/ansi/maindecl.html

Buggy but only to a pedant with infinite time: the word count on page 18 that uses a `double` which will run out of accuracy once you count more words than there are atoms in the universe. ( `double toot = 1000000000000000000; toot++;` what is the approximate value of toot?)

Buggy by omission: Section 5.5 (on page 104) doesn't really go into the hazards of C strings, and the text's implementation of strcpy when using strings from unfriendly places (public wifi, quicktime files) can turn the interesting kind of nasty if you hammer on it enough.

Unstylish: brace-less if statements in many examples; a two-line one on page 110, for example.

I worked as a contractor in a C shop for several months a few years back, and I kind of see what happens when you code K&R-style instead of extremely defensively and conservatively. For example, the strcpy on page 106 is cute, but is it going to be as maintainable as the naïve version the previous page, or the hypothetical (unless I just failed to find it) LCTHW known-length string copy function?

tptacek · on Jan 11, 2012

1. You're singling out the first "hello world" program in the book which is making an effort to show C using the fewest concepts necessary; flip elsewhere in the book and notice the other main() functions return int values. (The default "int" is C89; the "warning" is for C99; the second edition of K&R is ANSI C89, not C99).

2. First the nit misses the spirit of the example which is simply to demonstrate that "double" has higher precision than the fixnum "long" type; second, calling it a bug misses the fact that text explicitly does not say you can use it to count arbitrary strings; as the text says, it depends.

3. You're calling "buggy by omission" an implementation of strcpy not substantially different from BSD's libc. K&R's is, in fact, how you would write strcpy(). That strcpy() doesn't handle hostile strings is irrelevant; you're not intended use strcpy() on hostile strings.

4. Hard to fathom how one criticizes a book on C for braceless "if"s. We could go toe to toe on high-quality C programs and whether they ever use them; nginx, for example, avoids them; dovecot doesn't. Both are exceptionally high quality C codebases.

Not that you have anything to prove to me, but I reject all your examples, and think you were too cavalier about judging K&R.

But I appreciate the response (and thus the opportunity to piously respond to it). :)

bonzoesc · on Jan 11, 2012

> Not that you have anything to prove to me, but I reject all your examples, and think you were too cavalier about judging K&R.

I don't think it's necessarily cavalier. If I was asked to review code that aligned with the K&R examples today, I'd kick it back.

I'd much prefer a codebase that compiles with -Wall -Werror today than one that doesn't, one that doesn't use floats for integers, one that doesn't use strcpy even from a literal string to the heap, and one that doesn't put a braceless `if` inside a braceless `for`, and I suspect you would too.

It's a good book to learn 1988 C, and if you're just going to learn it for your classes and go off into the exciting world of Java and C# it's probably fine, but it's not the right book to teach someone to write production-quality C.

tptacek · on Jan 11, 2012

If you reject code that, for instance, strcpy's string literals (because they could have used strncpy to be extra safe), you're rejecting most professional code today.

I wish it was a best practice that strcpy() was never used, because it would make static source code analysis a lot easier: see strcpy()? Flag it! But no: lots of excellent C code properly relies on the assumption that string literals don't change their size at runtime.

Similarly, yours is a stylistic standard for braces that rejects OpenBSD KNF. Good luck with that. You're entitled to an opinion and, on your own dev teams, it's perfectly reasonable to demand consistency with an "always use braces" style. But it's not reasonable to call style fouls on other people's code that adheres to style(9).

Really strong disagree that K&R isn't a good first book for writing production C. I could go on & on, but since I'm echoing the commanding majority of all C programmers in sharing that sentiment, there's probably no need.

zedshaw · on Jan 11, 2012

Well when my book is finished you can come back read all of the ones I find, but demanding that someone go through the book and write down every example they found to answer your one-liner is a bit unfair.

tptacek · on Jan 11, 2012

I wasn't replying to you, and wouldn't have written that reply to you. You wrote a whole chapter of a book about your thoughts on K&R. You obviously read it.

I'm guessing the person I replied to has zero, because people who cavalierly call K&R "dated" and "buggy" probably haven't read it. But Bryce has an interesting background. I could be wrong. Calculated risk. Would love an answer from him.

halayli · on Jan 11, 2012

The function copy expects a C style string but he pass a buffer to some data. If you pass it the proper argument you'll get a proper result, it's that simple.

I see the complaint is more about C style strings in general because it's easy to corrupt them by overwriting NULL at the end rather than a complaint about "K&R C" book. I don't get why he decided to put it under this title.

zedshaw · on Jan 11, 2012

The problem is in the word "expects". This function expects certain things but doesn't assert that they're happening. Because of this you get bugs when you try to use that function in other code, which is what beginners do.

And yes, the solution is to require a size with every string buffer, which is what I do nearly all the time now. But, the best way to learn why that's solution is to try to hack something with this kind of defect.

halayli · on Jan 11, 2012

I think the core issue here is how C style strings are implemented. Personally, I much prefer if C strings were <size><data>, but what I was going after is that it's not K&R specific, it's a C design "mistake".

gonzo · on Jan 11, 2012

I'm sure I'll be downvoted, but C is not Java.

From what I can tell, the "it's time for C to go" comes from a generation raised on Java.

C still has its place.

squiggly101 · on Jan 11, 2012

Kind-of a long-winded way to point out the (very well known) fact that passing non-terminated strings to functions that expect terminated strings is fraught with danger, isn't it?

zedshaw · on Jan 11, 2012

Nope, and what I'm actually doing is fuzzing the input to this function, which is a common breaking technique.

You see, programmers tend to go around only using functions exactly as they were intended. Hackers come along and then use them in unintended ways, which causes bugs. The way to prevent this is to think like a hacker and try to use your code in unintended ways and then try to prevent them.

gonzo · on Jan 11, 2012

You say 'hackers' like it's a bad thing. As a reminder, this is "Hacker News".