A C Heisenbug in the wild

Cushman · on June 12, 2011

I don't mean to be that guy, but this is the sort of bug that makes me wonder about the state of professional programming. Even if you're not fluent in C, as a programmer it should be intuitively obvious what's going on here.

Of course we all have brain farts now and then, but there's no way this code should survive a second, let alone a third reading.

Or am I just being conceited?

alexgartrell · on June 12, 2011

It's non-obvious that asserts would just be #define'd out of existence when NDEBUG is defined if you haven't been told that specifically.

to3m · on June 12, 2011

As you might be told specifically by, for example... the documentation?!

The biggest surprise for me is that this error survived more than one attempt to remove it...

Swannie · on June 12, 2011

Doesn't surprise me if it's in a 20k loc program. Very easy to overlook the obvious.

That said, like the previous comments, if someone had seriously looked for this issue, you'd have hoped they'd have found this pretty quickly.

to3m · on June 13, 2011

No, that's a fair comment. I have spent a surprising proportion of the past 10 years moving initialization calls from <random problem-causing place> into `main', so whenever I see a call to something like `someinitialization' I just assume it's in `main' along with all of the rest of them.

But it could easily be somewhere creepy and random, and the side-effects a bit hard to spot, making it harder to find.

(Which, of course, is why you should always put all of this stuff in `main' ;)

leot · on June 13, 2011

Why bother having assert statements at all in a language if they're going to stick around in production code? If you're actually worried about something actually going wrong you shouldn't be looking to an assert statement (but, instead, probably a straight condition).

alexgartrell · on June 13, 2011

Asserts are useful because they blow up when the impossible happens, which greatly simplifies finding the problem later. Why is this a bad idea when the impossible happens in production code? In fact, many projects (such as Google Chrome/V8, in src/v8/src/checks.h and src/v8/src/checks.cc) deliberately provide a persistent alternative to assert so that it can be used in production code.

I'm not arguing that asserts should stick around forever, but it's obvious that there's at least an occasional need for something like assert in production code, and so it's not as straightforward as you suggest for someone who has never been told to jump to the correct conclusion.

leot · on June 13, 2011

I was under the impression that asserts are different from run-of-the-mill error checking and handling. In particular, they don't usually let you do anything if they fail, and simply cause the program to (informatively) terminate. But that said I don't have tons of experience with large projects.

schrototo · on June 12, 2011

I thought the same thing, as bugs go this one seemed rather trivial. Having said that, it is tremendously easy to overlook even the most obvious of errors in code you yourself wrote, even and especially when going over it multiple times.

vicaya · on June 13, 2011

I don't think so. At least 4 companies' codebase have their own assert/contractual macros specifically for this reasons. Such code will never pass a cursory review at these companies. This is a C-noob bug, IMO.

bch · on June 12, 2011

I get that assert() is a macro, but how does that affect the scope of char *p that it "works" in one case and segfaults (as I'd expect) in the other?

bch · on June 13, 2011

OK -- I wasn't looking at this clearly. In case there are others like me:

  * the scope of *p is only global. My comment re: scope is just Wrong.
  * there is never a call to someinitialization() *execpt* via the assert. If no assert() is called, the Bad Value prevails; if the assert() is called, it resets *p.

I feel a bit foolish, but found it easy to misread this at first glance. If anybody else didn't get it initially, hopefully this helps :P

I thought there was deeper magic, but it's really just programming flow, and knowing good practices re: assert() (or anything that has side effects).

_delirium · on June 12, 2011

A C compiler could warn on large classes of side-effecting code inside assert(), as he suggests, but it'd be tricky to decide exactly which kinds of side effects to warn on to avoid too much noise. At the very least, I/O stuff would probably have to be excluded, or else you'd spuriously warn on the common pattern of dumping some debugging info to stderr within a call like assert(everything_ok()).

Swannie · on June 12, 2011

I think "Side effect free" in assert() is the biggest takeaway from this.

zerd · on June 12, 2011

This is why we should read man pages:

man assert: "This may create Heisenbugs which go away when debugging is turned on."

barrkel · on June 12, 2011

This is why the Delphi compiler codebase (written in C) uses two functions, "invariant()" and "invariant_if_debug()", instead of library-supplied asserts. The second gets compiled out in release; the first triggers internal compiler error diagnostic even in production.

xxpor · on June 12, 2011

I was under the impression that Delphi was written in Delphi.

barrkel · on June 12, 2011

The Delphi IDE is written in Delphi, but the Delphi compiler is written in C.

xxpor · on June 12, 2011

Ah, interesting. I'll have to tell that to my dad, who always told me that Delphi was one of the earliest self hosting languages.

ryanpetrich · on June 13, 2011

Delphi is an object-oriented variant of Pascal. Pascal has been self-hosted since the late seventies.

jesstaa · on June 12, 2011

Avoiding this kind of bug is one of the reasons why Go doesn't have assert().

newobj · on June 12, 2011

That's not a Heisenbug. That's just a classic newbie mistake. If it's made by a non-newbie, then I highly question their competence.

icecommander · on June 12, 2011

The Python analog would be something like this:

assert (object() < object()) == False

(try running this in the interpreter a few times)

leot · on June 13, 2011

Not really.

The example given is equivalent in badness to:

  g = Timer()
  assert g.startTimer() == 0

... and then wondering why the timer doesn't start when you run your program in "optimized" mode.

coconutrandom · on June 12, 2011

Woah... so this is coercing it to a string and the address in hex is being compared. Am I off on this?

leot · on June 13, 2011

Yes. Python is not that mysterious.

For generic objects, the __lt__ and __gt__ methods are simply defined the only way that's sensible, which is effectively as a pointer location comparison.

ootachi · on June 13, 2011

This is the reason why I think it would be preferable for assert() to use eager evaluation. Any optimizing compiler will optimize away the useless check if debug mode is off, even if function calls obscure it (as long as the compiler can reason about purity).

abcd_f · on June 12, 2011

Trivial stuff, really. Useful to know - sure. Worth several pages of the blog post - hardly.

Now let's see the HN fanboism in action. It is Jacques' after all.

JoachimSchipper · on June 12, 2011

You're not necessarily wrong, but try being less nasty about it.