‘Abusing’ the C switch statement – beauty is in the eye of the beholder

saurik · on Feb 2, 2017

I think this looks a lot better if you use "if (false);" at the top to normalize the "else if" in the rest of the chain.

    switch (x) default: if (false);
        else if (valid_command_message(x))
        case CMD1: case CMD2: case CMD3: case CMD4:
            process_command_msg(x);
        
        else if (valid_status_message(x))
        case STATUS1: case STATUS2: case STATUS3:
            process_status_msg(x);
        
        else
            report_error(x);

That said, I think the entire concept of this code having both the enum switch and the valid_* functions is just horrible :/.

To be clear, it isn't that I find the code difficult to grok: if you don't understand how switch works, please stop using C.

The issue is that this particular use case for the switch is a lazy performance optimization, as it should just totally replace the valid_* calls if we do that; it is like the author doesn't want to update the switch cases, but wants the code to keep working? Did they just forget the switch exists, or is it that they can't edit it? I just don't get it, particularly when any modern C compiler supports "give me a warning if I don't consume all the possible switch values in this enum", which is the feature that this code should be using, not if < END_*.

BTW: it isn't clear to me that this would even be a performance optimization; in an ironic twist, many compilers are going to choose to compile that switch statement into the moral equivalent of two if statements with range checks, as default has to be implemented as a range check anyway, and if you work out how many range checks you are doing combined with the code cache benefits of having explicit branches instead of implicit jump tables, this switch statement is going to feel extra repetitive when you look at the machine code.

yokohummer7 · on Feb 2, 2017

> To be clear, it isn't that I find the code difficult to grok: if you don't understand how switch works, please stop using C.

Oh please. C is perfectly usable without knowing switch is actually a bunch of gotos. The idea that one has to know all the hidden edges of a language to use it is pointless, because every language has something like that. Even Python can be a minefield if one tries hard abusing metaclasses, and I'm pretty sure 95% of the Python users don't know how to use them.

If you pursue the idea of knowing every deep place to the extreme, the only language left for general developers to use is Go. A language so concerned at removing hidden features that almost no features are left.

saurik · on Feb 2, 2017

Ok, I don't mean "stop writing C", but "stop reading C": people too often look at not-even-difficult things and say "I happen to not have bothered to learn that, so you shouldn't use it in writing, as I couldn't read it", which to me sounds exactly like someone saying "I happen to have a first grade reading level, so you shouldn't use big words or long sentences or prepositions, because I didn't understand them". There are times and places where that is appropriate, but those are not all times and places, not I would argue the majority of times and places.

GFK_of_xmaspast · on Feb 3, 2017

I've been doing c for almost 25 years now, and for many of those years it was my primary language, and not only would I refuse to write that kind of crazy switch construct I would refuse to allow it to pass code review.

digler999 · on Feb 2, 2017

> The idea that one has to know all the hidden edges of a language

switch is hardly a hidden edge, it's an important control structure in the language. You want a "hidden edge", look at designated initializers [1] which I just heard of yesterday (thanks to HN), despite having 17 years experience using C. Or bitfields, or function pointers.

[1] http://www.drdobbs.com/the-new-c-declarations-initialization...

AnimalMuppet · on Feb 2, 2017

Switch is hardly a hidden edge, but using it like that is.

tomjakubowski · on Feb 3, 2017

Huh, I wouldn't think of function pointers as a hidden edge in the language, and I've only been seriously using C for about 4 years.

Is there some crazy consequence of using function pointers that I've been missing? Callbacks are pretty ubiquitous in C code bases, and the "hand coded vtable" pattern isn't too uncommon.

Maybe the lesson is that what language feature are the "hidden edges" depend on one's domain.

digler999 · on Feb 3, 2017

I was being figurative. nothing's hidden in an ANSI standardized language. My point was you can go along way in C without ever needing function ptrs. Their syntax isn't exactly welcoming, either. Compared to switch, which OP had compared to a hidden corner, need switch a lot more frequently than you need function pointers.

minitech · on Feb 3, 2017

Designated initializers aren’t “hidden” at all, just new (and not that new).

ConceptJunkie · on Feb 2, 2017

I've always understood switch statements just fine.

However, after this article, I now understand them a little better.

sh_tinh_hair · on Feb 3, 2017

Exactly.

mikeash · on Feb 2, 2017

I find it pretty hard to understand. I think they ultimately boil down to a logical OR of the `valid` and equality with the various constants, but it's pretty painful.

I'd never write something like this in code that might make it to production. The odds of the next guy misunderstanding it, or of me misunderstanding it in six months, are way too high. I'd only consider using it as part of an IOCCC entry.

(And to preempt any cries of "well, you just don't understand C very well," I do have a winning entry in the IOCCC.)

ggggtez · on Feb 2, 2017

Agreed. Without timing, the author just assumes it must be faster. However the valid_ function isn't even optimized. He seriously purposes

if(x) return true; else return false;

Why not just

return x;

It's clear the author is trying to optimize something, but they don't even start with the obvious places first.

pyre · on Feb 2, 2017

That really seems like the sort of thing that the compiler should be optimizing, the rest just being a case of coding style. It really seems (to me) like saying that `i += 1` isn't as "optimized" as `i++`.

mikeash · on Feb 2, 2017

Yeah, if your compiler can't optimize the if statement to the same code as the direct return, throw it away and get a new one.

And there is pretty much never an "obvious" place to optimize. Profile first. If you want to call it "obvious" if something takes up 90% of the runtime in the profiler when it shouldn't, then that's acceptable, but don't just look at the code and say "obviously this part is slow."

krylon · on Feb 2, 2017

> if your compiler can't optimize the if statement to the same code as the direct return, throw it away and get a new one

To be fair, in embedded programming, you are sometimes stuck with the compiler a particular vendor hands you.

> And there is pretty much never an "obvious" place to optimize. Profile first.

Yes, a thousand times yes! ;-) I remember writing a rather convoluted piece of C code a couple of years back that involved stuffing data into a data structure, then looking up data in that data structure. My first thought was "I'm gonna need a hash table", but I used an array and qsort/bsearch, so I could get the rest working; when the rest was working (as in giving me correct results, but at glacial speed), I ran a profiler, fully expecting it to tell me the array/qsort/bsearch-thing was wasting huge amounts of time. I was rather surprised that it amounted to ~2% of the total running time. I've had people tell me before to profile, then optimize the parts that matter, and not to make any assumptions about what parts of my code are going to need optimization. And I did believe the people telling me this, but only at that moment did I understand how right those people were.

> "obviously this part is slow."

Telling whether a given piece of code is slow (in the sense of "this part could be optimized to run 10x as fast) or not is not too difficult, IMHO. Telling whether it makes up for a significant percentage of the total running time, is. Very hard.

mikeash · on Feb 2, 2017

Good point with embedded systems. I must have momentarily forgotten that there are targets beyond PC-like devices.

dllthomas · on Feb 3, 2017

> And there is pretty much never an "obvious" place to optimize.

Never an obvious place to microoptimize. If you know you're doing a huge amount of unnecessary work, cutting that out can often be obvious.

ue_ · on Feb 2, 2017

>To be clear, it isn't that I find the code difficult to grok: if you don't understand how switch works, please stop using C.

As an aspiring C programmer, is there anywhere I can read about interesting applications of the switch statement? Pretty much everything I've learned has come with neat little break;s and default at the end of each case.

lentil_soup · on Feb 2, 2017

If you're an aspiring C programmer don't worry about these weird quirks. Don't believe that you need to know that to use C. A switch statement with "neat little breaks and default" is more than what you'll need 99% of the time.

iainmerrick · on Feb 2, 2017

The remaining 1% is a big comment reading "/* FALL THROUGH */" instead of a break. Once you know what break does, you should know what happens when you leave it out (and why you might want to).

The remaining 0% is stuff like Duff's Device.

geofft · on Feb 2, 2017

A switch statement is just a bunch of gotos who have hurriedly put on an ill-fitting suit and tie to look respectable.

Any context where you would want "if x == this, goto here; if x == that, goto there" is a potential place to use the switch statement. Duff's device is a good example, but there are many things that can be built this way.

dllthomas · on Feb 3, 2017

> Any context where you would want "if x == this, goto here; if x == that, goto there" is a potential place to use the switch statement.

Only if "here" and "there" are later on in the same block.

pkkim · on Feb 2, 2017

Probably not for production use (as most "interesting" code isn't): https://en.m.wikipedia.org/wiki/Duff's_device

ConceptJunkie · on Feb 2, 2017

You say that, but I pulled out several instances of Duff's Device from production code a few years ago, and replaced them with memcpy( ) and memcmp( ) as appropriate. This isn't 1986 any more.

paulrpotts · on Feb 3, 2017

If you want to read about horrible but entertaining C hacks, check out the entries in the International Obfuscated C Code Contest, the book that came out of the contests a few years back, or books like Deep C Secrets.

http://www.ioccc.org/

As an developer who has written a lot of shipping code in C and C++ for almost 30 years, I would strongly advise you to write your switch statements in as straightforward a way as possible. If you use fall through, place very noticeable /* FALL THROUGH */ comments to indicate that you meant to do that. If you wind up doing some kind of oddball thing because you _have confirmed_ that it results in significantly better optimization for a specific hardware target and compiler, comment the crap out of that, maybe make it conditional with an #if block, and (what I prefer to do is) write the simple version too, test both versions, and keep the simple version as a comment, so a future maintainer can verify that the complicated code is doing what you intended.

AnimalMuppet · on Feb 2, 2017

(Looks around...)

(Kisses karma goodbye...)

"Oh, give me a break!"

(Runs...)

anoxic · on Feb 2, 2017

I think the idea is that when the interface changes, even if someone else comes in and makes some naive updates later, it should still continue running.

pdpi · on Feb 2, 2017

If this sort of thing tickles your fancy, you might also want to read about Duff's Device.

https://en.m.wikipedia.org/wiki/Duff's_device

buserror · on Feb 2, 2017

I use duff's all the time in micro-embedded; best way to have 'coroutines' without all the necessary stack switching interface. It has it's limitations of course, but it's a terrific tool in certain circumstances.

phkahler · on Feb 2, 2017

>> I use duff's all the time in micro-embedded; best way to have 'coroutines' without all the necessary stack switching interface.

Can you provide an example? We use state machines all the time, but I have not seen a good example of mixing other flow control statement into a switch/case outside of Duff's Device.

buserror · on Feb 2, 2017

I posted a gist [0] sorry it's not really made for public scrutiny, it's an old pet project clock, a video of most of that code is still there: https://plus.google.com/+MichelPollet/posts/ARDMrY4jSxp

That code does timer, radio clock decoding, scrolling, blinking and all that using multiple 'protothreads'. As you can see the main loop sleep() and wait for a timer interrupt (or anything) to fire to wake up, so it's not even running most of the time; only when the 'tick' timer fires.

[0] https://gist.github.com/buserror/9407adb6d52153e16caad5e8a08...

buserror · on Feb 2, 2017

The trick is on the PT_ macros really, they hide the switch, case and gotos. I'll post the header later, but you probably can find it by googling for protothreads

josephg · on Feb 3, 2017

Whoa, PT is fantastic.

A full explanation of how it works is here: http://dunkels.com/adam/pt/expansion.html

Its actually a really simple and beautiful. I worry that we give up a lot of this sort of flow expressiveness in more modern languages like Rust.

pcmonk · on Feb 2, 2017

Ah, I see it now. Thanks!

i336_ · on Feb 4, 2017

The one thing I really really like: you can see the hour/minute colon separate flashing as it scrolls.

This is really cool.

pcmonk · on Feb 2, 2017

Can you point to the usage of Duff's device there? I'm not really seeing it.

iainmerrick · on Feb 2, 2017

Following footnotes on Wikipedia led me to this nice clear explanation: http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html

mzs · on Feb 2, 2017

Thank you, when I saw this blog post it reminded me that years ago I had seen something similar but even more brain melting - it was PuTTY!

phaedrus · on Feb 2, 2017

Check out the Castor C++ logic programming library - it makes extensive use of coroutines based on the principles in common with Duff's device. The Castor macros co_begin, co_end, co_yield, and co_return contain the implementation code.

doppioandante · on Feb 2, 2017

Can you make an example?

arianvanp · on Feb 2, 2017

https://github.com/arianvp/MidiProject/blob/master/MidiProje...

I use duffs device to concurrently read and debounce multiple inputs in my old dj midi controller

http://dunkels.com/adam/pt/expansion.html

otikik · on Feb 2, 2017

It tickles something inside me, but it definitively is not my fancy. And "tickle" is not the right word to describe what it does to me, either.

merraksh · on Feb 2, 2017

Good point. I'm not sure why all snippets have

  *to = *from++;

while it should be

  *to++ = *from++;

pdpi · on Feb 2, 2017

Actually, the examples are correct. The context for this particular snippet is memory-mapped IO, so the `to` pointer is actually the memory address where some external device is mapped to and accepting input. Repeatedly writing to that location causes the values to be sent to the device.

merraksh · on Feb 2, 2017

OK, thanks for the explanation.

cytzol · on Feb 2, 2017

This is a perfect example of why comments that explain why something is going on, rather than just what is going on, are so useful!

aninhumer · on Feb 2, 2017

I think in this case you could just put the write into a function/macro that documents it's intent.

mannykannot · on Feb 3, 2017

There are many good reasons for making functions, and occasionally even a good reason to write a macro, but avoiding writing a comment is not one of them.

aninhumer · on Feb 4, 2017

I disagree. Writing a comment to express something that can adequately be expressed by code adds overhead to reading it.

mannykannot · on Feb 4, 2017

That looks to me like a made-up problem. I could be persuaded otherwise, though, so post some code that shows how it works for this particular case.

to3m · on Feb 2, 2017

It already is - the function is called "send", and rather than wasting its time just doing one write, it does a whole pile of them at once ;)

YSFEJ4SWJUVU6 · on Feb 2, 2017

Neat `abuse'.

Personally I don't like the what I consider an anti-pattern he used in his refactored functions. I'm talking about doing this:

    if (condition) return true;
    return false;

instead of simply

    return condition;

unwind · on Feb 2, 2017

Yeah, I hate that with a burning passion, too.

The only justification I've seen/heard is that it makes it more "debuggable", you can single-step through and see that the expected path is taken, i.e. that the condition is properly evaluated.

Still, I hate it and would much rather check that some other way (by inspecting the return value before the function exits, for instance).

pdpi · on Feb 2, 2017

If you're going for debuggable, though, you still don't want the if and multiple returns. You'd be better off with something like:

    bool result = (condition);
    return result;

to3m · on Feb 2, 2017

This is good, but you do still want the multiple returns as well, because then you can breakpoint one case or the other without any performance degradation when the breakpoint is on the rarer case.

(Also, at least for C/C++, you generally want condition and each case on separate lines, so that you can actually put breakpoints on each thing separately - tool support for multiple independent breakpoints on a single line is still an utter shambles, and you're best off just assuming nothing supports it.)

Unconditional breakpoints that aren't being hit are a lot cheaper (if they even cost anything at all - which usually they don't) than conditional breakpoints that are always being hit...

richardwhiuk · on Feb 2, 2017

Their will be no difference in this case - the compiler will optimize the boolean foo = {complicated expression}; if (foo){ return true; } else { return false; } into return {complicated expression}, and the debugger will be forced to add a conditional breakpoint.

to3m · on Feb 2, 2017

I've never used a debugger that's that clever (and I'm not sure debug info gives it enough to work with to deduce this anyway, not with DWARF at least) - the ones I've used just put an unconditional breakpoint on the instruction the compiler says that corresponds to, and then show it as attached to whichever line it chooses. (DWARF decrees that each instruction is from one source line, but one line may map to many instructions - so there's no perfect way of doing this backwards.)

This is why it's usual to debug the unoptimized build... and why this situation is hypothetical ;)

There's nothing wrong with running an optimized build inside a debugger; in fact, I recommend it, since when it crashes you've then got at least some chance of figuring out what's going on. But for the sort of work where you're putting breakpoints on one clause rather than another, or stepping through code line by line trying to figure out what's happening, it's not very good.

chillingeffect · on Feb 2, 2017

yes, I wish everyone did this all the time. even to the point of:

    bool result = (condition);
    if( result )
    {
      do_some_shit();
    } else
    { 
      do_some_different_shit();
    }

p.s. how did you format the code section of your post?

pdpi · on Feb 2, 2017

Prefix with 4 spaces. :)

pyre · on Feb 2, 2017

4 spaces is Markdown, HN only requires 2 spaces.

https://news.ycombinator.com/formatdoc

unwind · on Feb 3, 2017

True, except of course the 'result' should be 'const'. Always 'const' all the things that never change after being assigned. Another pet peeve, obviously. :)

DSMan195276 · on Feb 2, 2017

I agree 100%.

I would add that when using the standard `_Bool` type (IE. `bool` if you include `<stdbool.h>`) all assignments to it are converted to their 'logical value' first. So even if condition is an integer or pointer, if you did what you suggested the `condition` will still be converted into `true` or `false` on the return exactly the same as if you did the `if`. It's exactly the same as doing `return !!condition`.

Now if you're function is not returning a `bool`, then that conversion won't happen - but of course that doesn't matter if `condition` is already a logical value of 0 or 1 (Which it is in this case). And IMO, if your value isn't a logical value I'd much rather see `return !!condition` instead of the `if` anyway (Though I concede that those not familiar with C may not immediately recognize what that syntax does).

vram22 · on Feb 2, 2017

Common beginner mistake. (Not saying the OP is a beginner, may have had some specific reason, since it says "in his refactored functions".)

Other similar ones are things like:

    if ((count > 10) == true)

# in any language, not just C, so I'm using generic placeholder for boolean true,

instead of:

    if (count > 10)

and inability to grok stuff like:

    while not done
    (or while not found)
        ....
        # set done or found here based on some condition

0xfeba · on Feb 2, 2017

The other 2 devs on my team (both senior to me) routinely check in code like this:

  function(filename) {
    if (!filename) {
      this.setState( {x : null});
    }

    var s = filename;
    if (s) {
      this.setState({y :''});
    }
    else {
      this.setState({z: false});
    }
  }

Which to me is unfinished code (I truncated variable names). I always try to cleanup my code before checking in, probably to my detriment. They both are much faster at pushing features out. But god, some of the code is just unreadable. Also--no comments. For some reason everyone at my current company refuses to use comments.

And I've seen plenty of:

  x == true ? true : false

ternaries, though in javascript, these are not necessarily redundant and are easy to mistake. A couple other favorites I've seen:

  if (x == 'true') ...

and

  var data = {option: 1}
  var context = this;
  this.someMethod.bind(this, context, data, data.option)

vram22 · on Feb 2, 2017

Wow. Or rather, !wow.

Such practices are all too common ... should be the rare exception, really.

catnaroek · on Feb 2, 2017

That's not even an anti-pattern. It's just being ignorant of the basic properties of the Boolean type.

peapicker · on Feb 2, 2017

My favorite abuse of switch statements in C is to use them for co-routines in C...

http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html

http://blog.robertelder.org/switch-statements-statement-expr...

huntergoldstein · on Feb 2, 2017

To add onto this: - http://dunkels.com/adam/pt/

Even though it is an abuse, these sorts of libraries are fascinating to me: making use of language features in unexpected (most likely unintended) ways.

peapicker · on Feb 2, 2017

Agreed! (I had forgotten about protothreads, thanks)

rootbear · on Feb 2, 2017

While we're discussing abuse of switch, here's my personal best effort at misusing this feature, from over thirty years ago.

  /*
   * A program to print The Twelve Days of Christmas
   * using the C fall through case statement.
   *
   * Jim Williams, jim@maryland, 2 May 1986 (but first written ca. 1981)
   */

  /*
   * If you have an ANSI compatible terminal then
   * #define ANSITTY.  It makes the five Golden rings
   * especially tacky.
   */

  #include <stdio.h>

  char *day_name[] = {
      "",
      "first",
      "second",
      "third",
      "fourth",
      "fifth",
      "sixth",
      "seventh",
      "eighth",
      "ninth",
      "tenth",
      "eleventh",
      "twelfth"
  };

  int
  main()
  {
      int day;

      printf("The Twelve Days of Christmas.\n\n");

      for (day = 1; day <= 12; day++) {
          printf("On the %s day of Christmas, my true love gave to me\n",
               day_name[day]);

          switch (day) {
          case 12:
              printf("\tTwelve drummers drumming,\n");
          case 11:
              printf("\tEleven lords a leaping,\n");
          case 10:
              printf("\tTen ladies dancing,\n");
          case 9:
              printf("\tNine pipers piping,\n");
          case 8:
              printf("\tEight maids a milking,\n");
          case 7:
              printf("\tSeven swans a swimming,\n");
          case 6:
              printf("\tSix geese a laying,\n");
          case 5:
  #ifdef ANSITTY
              printf("\tFive ^[[1;5;7mGolden^[[0m rings,\n");
  #else
              printf("\tFive Golden rings,\n");
  #endif
          case 4:
              printf("\tFour calling birds,\n");
          case 3:
              printf("\tThree French hens,\n");
          case 2:
              printf("\tTwo turtle doves, and\n");
          case 1:
              printf("\tA partridge in a pear tree.\n\n");
          }
      }
      return 0;
  }

seishun · on Feb 2, 2017

This is hardly abuse. This is exactly how fallthrough was intended to be used.

Tarean · on Feb 2, 2017

I mean, an array could have worked but I think this is more readable than a clever version would have been.

    main = mapM_ putStrLn $ reverse verses
      where 
        verses = combine <$> enumeratedSlices songLines
        combine (i, s) = "On the %s day of Christmas, my true love gave to me\n%s" % dayNames !! i $ concat s
    enumeratedSlices = zip [0..] . init . tails

jcoffland · on Feb 3, 2017

That is abusive. You started counting from one not zero.

rootbear · on Feb 3, 2017

Yeah, well, I was young and inexperienced... my first programming languages were BASIC and FORTRAN...

chiph · on Feb 2, 2017

Abuse of the switch statement? The firm bought a company and we found in their source code that they had created a 50,000 case-statement monster.

We asked that they refactor it, and they agreed because the sheer size of it was giving the compiler problems. So they divided it into two separate 25,000 case-statement monsters.

noobermin · on Feb 2, 2017

That moment when your jump causes a cache miss.

jmts · on Feb 3, 2017

This is ridiculous. Sure, I'll agree, it's a neat trick, but what mindset do you need to be in to come up with such a convoluted solution to such a problem, and the wrong problem at that?

What problem does hack this solve? Single point of edit for a new feature? Well, odds are you've added/edited code elsewhere to implement the feature. Saving one edit is pointless. Just do it properly.

What problem is this code trying to solve? Message dispatch. And it does it. Twice. Once to differentiate whether something is a command or a status message, and then once again to actually execute the thing. Silly. Just build a table and be done with it. If you're concerned about performance after you've benchmarked it, write some code to generate a perfect hash and generate your table.

    static const struct message_handler {
        int id;
        void (*process)(int x);
    } tab[] = {
        { .id = CMD1, .process = process_cmd1, },
        { .id = CMD2, .process = process_cmd2, },
        { .id = CMD3, .process = process_cmd3, },
        // ...
        { .id = STATUS1, .process = process_status1, },
        { .id = STATUS2, .process = process_status2, },
        { .id = STATUS3, .process = process_status3, },
    };

    const struct message_handler* find_message_handler(int x);

    void process_message(int x) {
        const struct message_handler *handler;
        handler = find_message_handler(x);
        if (NULL != handler) {
            handler->process(x);
        }
        else {
            report_error(x);
        }
    }

Now tell me which one your code reviewers and juniors are going to prefer.

zik · on Feb 3, 2017

> What problem is this code trying to solve?

I think you missed the point of this hack - the problem he's trying to solve is described in the article:

> The common issue associated with switch statement is typically maintenance; especially where the set of ‘valid’ values needs extending.

He wants the efficiency of the switch statement for all the cases he knows about and a fallback to the less efficient if clauses if that fails due to new cases being added. It's a pretty silly example but let's face it - it's more for fun than elegance.

jmts · on Feb 3, 2017

I see your point, but I'd argue that even then it's still the wrong the problem.

1. If you don't enjoy maintaining your switch statement, it's probably huge anyway and you need to refactor.

2. If you don't enjoy maintaining this switch statement, you probably also don't enjoy maintaining the switch statement inside next function that gets called that gives you your full set of valid values.

3. The number of types of messages used is probably much fewer than unique messages, so you are optimising on a small N, which will give minimal returns.

4. You never need to look at this code again and it will just sit there adding overhead. Other switch statements will be maintained because they're the first place you'll notice missing entries.

Fun, sure. Like setting a fistful of matches on fire. It takes a bit of effort to set up, and it looks interesting. I'll agree to that.

JoachimSchipper · on Feb 2, 2017

Where available, my preferred solution is to compile with gcc -Wall -Werror (or equivalent); this will enable -Wswitch, causing a compile error when you fail to handle an 'enum' case. Needing to fix a few obvious compile errors after a change isn't much of a burden; un-maintainable code is code that silently breaks on changes. (The same attitude leads to lot of static_assert().)

(You may also be interested in -Wswitch-enum.)

nwmcsween · on Feb 3, 2017

No do not do -Wall and -Werror as Wall is arbitrary between compilers and versions.

JoachimSchipper · on Feb 3, 2017

You're absolutely right that -Werror should not be the default in release tarballs, and I should have included that.

LeifCarrotson · on Feb 2, 2017

> If the implementation uses a form of jump-table then this [switch statement] has the benefit of giving you an O(1) performance based on the messages, whereas with the if-else-if chain, the commands will be checked and processed before status and errors.

The reason for using a switch statement in the first place was so that you'd have constant performance. The argument against the simple implementation was:

> Let’s assume in v2.0 of the system we want to extend the message set to include two new commands and one new status message ... the if-else-if version will handle the change without modification, whereas the existing switch statement treats the new commands still as errors.

Treating the new commands as errors is a feature of the switch statement style, not a problem. They need to be added to the jump table, not implemented with reduced performance.

derriz · on Feb 2, 2017

Also using big O notation here makes no sense surely? The claim that the switch version is (in contrast) O(1) is not even wrong.

clusmore · on Feb 2, 2017

Yeah I'm not quite sure I understand this statement either. Does he mean that in the case of a command the first `if` passes and the rest are ignored, but in the case of status and error you need to perform the previous `if`s as well? So then the case for status is what, O(2)? That makes no sense.

If you're going to write something this obscure for the sake of "performance", you want to be damn sure it's worth it -- that performance is even an issue, and that this is a large enough improvement to justify not doing the simple thing.

Honestly it feels like the author wants to do it this way because it's clever, and the "performance" reason is just rationalising it.

DannyB2 · on Feb 2, 2017

No need to abuse the C switch statement.

#define while if // make code faster

#define struct union // use less memory

elmigranto · on Feb 2, 2017

Just define all the keywords to be `exit(0);` #performance :)

geofft · on Feb 2, 2017

I once got in trouble in high school for encouraging a classmate who'd misread an assignment to do `#define int double` to fix it.

rootbear · on Feb 2, 2017

But don't forget that defining define is undefined!

mcintyre1994 · on Feb 2, 2017

Of all the things I learned in kindergarten this was my favourite :)

In all seriousness, this is.. interesting! I had no idea you could have a case inside default like that.

to3m · on Feb 2, 2017

But there is no "inside": case and default are like (slightly restricted) goto labels. They mark a place in the code, nothing more, and, as with goto labels, they can be inside nested scopes. That's why you need the "break" to explicitly leave the switch - because when execution reaches a label, it just continues. The label is a marker, nothing more.

(In terms of restrictions: case/default have to be inside the switch statement somewhere, and not inside some additional switch statement nested within it. But aside from that, as demonstrated by the article, you have a good deal of freedom about where they go.)

mcintyre1994 · on Feb 3, 2017

This was a really helpful explanation, thankyou :)

pdpi · on Feb 2, 2017

It's not "inside" the default, as such. I'm not a C expert but, near as I can tell, the labels for a switch statement don't need to occur all be at the same level within the control flow inside the switch body. See my other post here about Duff's Device as another neat example of exploiting this.

lisper · on Feb 2, 2017

Of course if you were programming in a language with proper macros you could just automatically generate the switch statement from the enums and have the best of all possible worlds.

greyman · on Feb 2, 2017

For me, the beauty is when it is apparent on the first sight what the code does. This is just the opposite of beauty (in my eyes :-)).

marcosdumay · on Feb 2, 2017

There is more to code than legibility. In general, the priorities are correctness, legibility, legibility, a few hundred more times legibility, then stuff like performance, memory usage, binary size, predictability, etc.

But this is C. C is used on a wide diversity of contexts, and on some of then the order above changes.

gpderetta · on Feb 2, 2017

... except when performance, memory usage, binary size, predictability are part of the correctness requirement (which is most of the time in the areas C is used).

Sanddancer · on Feb 2, 2017

When dealing with embedded stuff, you don't always have the luxury of code at first sight. Sometimes you have to write ugly code like this code does, sometimes you have to roll your own print statements because using printf pulls in a few k of code, sometimes you have to use a uart running at 600 baud to get a 60 hz timer. Of course, when you start going down the path of interesting, it's always best to write lots of good documentation to go with the cleverness, so people understand what's going on.

MichaelMoser123 · on Feb 3, 2017

While dealing with unhandled enumeration constants you can get help from most compilers; for gcc:

  #include <stdio.h>
 
  typedef enum tagCmd { cmd1, cmd2, cmd3 } Cmd;

  void doCmd(Cmd cmd )
  {
        switch(cmd)
        {
                case cmd1:
                        printf("Cmd1\n");
                        break;
                case cmd2:
                        printf("Cmd2\n");
                        break;
                //case cmd3:
                //      printf("Cmd3\n");
                //      break;
        }
  }

  int main()
  {
        doCmd(cmd1);
        return 0;
  }

Now

  gcc -Wall  t.c
  t.c: In function ‘doCmd’:
  t.c:7:2: warning: enumeration value ‘cmd3’ not handled in        
  switch [-Wswitch]
    switch(cmd)
    ^

Now

  gcc -Wall -Werror  t.c
  t.c: In function ‘doCmd’:
  t.c:7:2: error: enumeration value ‘cmd3’ not handled in   
  switch [-Werror=switch]
    switch(cmd)
    ^

Or rather the following: (if you don't want to deal with a lot of other errors/warnings)

  gcc -Wswitch -Werror  t.c
  t.c: In function ‘doCmd’:
  t.c:7:2: error: enumeration value ‘cmd3’ not handled in   
  switch [-Werror=switch]
    switch(cmd)
    ^

If you put in a default statement then the compiler will not be able to help you here.

janekm · on Feb 2, 2017

This is just terrible. The fact that the compiler complains if an enum is added without updating the switch statement is a feature. The enum and switch version is clear and explicit. To defeat it, in the name of... what, code obfuscation? beggars belief.

iainmerrick · on Feb 2, 2017

I guess the reminder that C's switch syntax is bizarre and flexible is amusing, but this is still really wrong-headed. Friends don't let friends write code like this!

We're told we have "process_command_msg" and "process_status_msg" functions. Each of those functions is already doing a switch or if/else to determine the exact message type. If you care about the cost of those lookups, the correct thing to do is have a single switch statement that handles all the messages at once.

If you don't care enough about the cost of lookups to split up those functions, you should stick with the obvious if/else test.

Another thing you could do to improve the design is to have a "get_message_type" function returning a "COMMAND_TYPE" or "STATUS_TYPE" enum, then switch on that. That can be made efficient if you really care about execution cost (for example, check a single bit, and inline it so the switch can be optimized).

Abusing the switch statement not only gives you unreadable and bug-prone code, it simply isn't any better than a sensible approach.

jadengeller · on Feb 2, 2017

Macro lists are a rather elegant solution IMO. You cannot add cases to the enum without also adding cases to the switch.

  #define CMD_LIST CMD(1) CMD(2) CMD(3) CMD(4)
  #define STATUS_LIST STATUS(1) STATUS(2) STATUS(3)

  enum {
    #define CMD(x) CMD ## x = x - 1,
    CMD_LIST
    #undef CMD
    END_CMD
  };

  enum {
    #define STATUS(x) STATUS ## x = x + 9,
    STATUS_LIST
    #undef STATUS
    END_STATUS
  };

  void process_message(int x)
  {
    switch (x) {
      #define CMD(x) case CMD ## x:
      CMD_LIST
      #undef CMD
        process_command_msg(x);
        break;
      #define STATUS(x) case STATUS ## x:
      STATUS_LIST
      #undef STATUS
        process_status_msg(x);
        break;
      default:
        report_error(x);
        break;
    }
  }

It's unfortunate that it isn't possible to write a macro macro that expands to that gross def/undef boilerplate we use in the switch.

evincarofautumn · on Feb 3, 2017

You can use an x-macro:

  // cmds.def
  X(1)
  X(2)
  X(3)
  X(4)
  #undef X

  // elsewhere.h
  enum {
  #define X(x) CMD##x = x - 1,
  #include "cmds.def"
    END_CMD
  };

Or a higher-order macro:

  #define CMD_LIST(X) X(1) X(2) X(3) X(4)

  enum {
  #define CMD(x) CMD##x = x - 1,
    CMD_LIST(CMD)
  #undef CMD
    END_CMD
  };

technofiend · on Feb 2, 2017

People may disagree but if the function valid_command_message already bounds checks x, isn't the case statement redundant? Do you really want the same bounds check in two different places? Maybe in embedded programming you do for safety but it seems like a simple if test is sufficient.

ifdefdebug · on Feb 2, 2017

The switch jumps directly to the case labels, skipping the valid_command_message call. At least that's the concept, but you better know what your compiler makes out of it.

So this might be useful if you want the jump table performance now on the known values, but you suspect that in the future some maintainer might add more values to the enums and forget about the switch, because maybe the enums are hidden in some header file far away.

gorb314 · on Feb 2, 2017

I'm the author of libb64[1], which (ab)uses the switch statement in a similar way to implement a base64 encoder and decoder. This was inspired in turn by "Coroutines in C"[2], written by Simon Tatham.

[1] http://libb64.sourceforge.net

[2] http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html

detrino · on Feb 3, 2017

It's impressive that clang generates the same assembly for both versions: https://godbolt.org/g/yLh1EF

Note that the assembly for process_message2 calls each function only once even though some appear twice in the body.

gp7 · on Feb 2, 2017

1) All four versions are O(1)

2) Textually interleaving two implementations is bad no matter what language you're in

3) You could achieve the same effect with gotos without giving anyone a migraine... but it still wouldn't be worth it

noobermin · on Feb 2, 2017

So the main point here is we save some time with a direct jump at least for a few values of x? It's cool I guess but the extra confusion might not be worth it.

Sanddancer · on Feb 2, 2017

The author is coming from the embedded world, where saving time in an important loop can determine whether or not your code can operate in the required time windows. Keep in mind that there are a lot of micros out there that are being used for new projects that have memory amounts in the hundreds or even tens of bytes. When you start getting into those areas, sometimes you have to cheat to get things working.

noobermin · on Feb 2, 2017

Sure, in that case, then I agree with the saurik. The solution is you just put in more labels in your switch statement. And, if you aren't so keen on that sort of discipline, use a macro ALL_CMDS so you don't have to change nine places in the code.

Go "O(1)" on all the CMDs or don't, for the sake of ease of comprehension. In the hypothetical, or may be practical world[0] where size+computation constraints require you must goto some of them and if-else the rest of them, I suppose the "1/2goto,1/2if-else" optimization would be one of the latter considerations for optimization over a host of other things you can consider.

[0] I admit I don't hack in that world, but I do computational physics, which is more about vectorizing things, not optimization of branching typically.

julian_1 · on Feb 2, 2017

A compiler like gcc will (or at least used to) translate small switch statements into compare/jmp assembler, and larger switch statements into a jump table using indexes/modulos on the test expression.

The tipping point for when the compiler decides to switch strategies, is obviously architecture dependent but could also be influenced by what flags are set and whether to favor for small code side, or highest speed.

I suspect, the code could have been written using two switch statements - and probably achieved the same effect in terms of the generated assembler, as well as be more idiomatic.

DannyBee · on Feb 2, 2017

Meh, this is pointless, the compiler chooses if ranges or jump table regardless of which form you use here. But it probably makes you feel better :)

loup-vaillant · on Feb 3, 2017

"Fixing" the switch statement.

  #define case    break; case
  #define default break; default

There, no more fall through.

vram22 · on Feb 2, 2017

Partly related:

Simulating the C switch statement in Python:

https://jugad2.blogspot.in/2016/12/simulating-c-switch-state...

Not a proper simulation of C's switch, has limitations mentioned in the post, just something I whipped up for fun.

lend000 · on Feb 2, 2017

Interesting ideas, but this bothers me: "Then refactored to:"

Refactored implies performance improvement to me, and inlining is almost always faster than the more modular representation that the author ended up with after "refactoring" (putting the comparisons in functions).

kiliancs · on Feb 2, 2017

Readability improvements and complexity reductions are often more important than performance improvements and very good reasons to refactor.

lsiebert · on Feb 3, 2017

I think you can probably get by with an array of function pointers, and bittwiddling to get values.

For example, if i is the ENUM code that was passed, then 1 ^ ((unsigned int)i >> (sizeof(int) * sizeof(char) - 1)) will be 0 if negative and 1 if positive or zero.

qwertyuiop924 · on Feb 2, 2017

I didn't learn this in kindergarten, but I did learn about Duff's Device in the 7th grade (as it turns out, the Jargon File makes for pretty good reading during boring classes).

Never had cause to implement it, though, and never saw this particular hack.

peter303 · on Feb 2, 2017

Modal behavior should implemented through subclassing, not switches or if tests. This has the property of consolidating all code for a class in the same code file and not scattered among larger pieces of code. Much easier to maintain and extend.

kwhitefoot · on Feb 2, 2017

What classes? This is C not C++ or C#.

I don't dispute that one can use C for class based coding but it isn't common.

kayamon · on Feb 2, 2017

Many compilers now support a range selection extension on case labels, so you can just do:

    case CMD .. CMD_END: /* whatever */ break;

delinka · on Feb 2, 2017

Did I miss an explanation somewhere? Why are the `process_` functions referenced again in the default case? That has a code smell.

spott · on Feb 2, 2017

The first version references them to allow for a single point of refactoring to increase the number of command messages or status messages (the valid_xxxx_message() function call). He recognizes this as a not ideal and then refactors the code to have both the benefit of the jump table created by the switch statement and the benefit of the if statement (the single point of refactoring).

delinka · on Feb 2, 2017

I see now. If we add to the `enum`s and forget to update our `switch`, we still get the chance to process new commands, albeit with a minor performance hit.

Personal opinion: Unit tests should be used to let you know when you forgot to add in new `case`s.

gpderetta · on Feb 2, 2017

So, what lets you know when you forgot to update your unit tests for the new cases?

Checking for switch/case completeness is the job of the compiler and -Wall -Werror.

delinka · on Feb 2, 2017

The fact that you got output from `report_error()`

pdpi · on Feb 2, 2017

Did you reach the end of the post? In the final version of the code, each of those functions is used exactly once.

delinka · on Feb 2, 2017

I did. But not requiring the `if` at all obviates the need for this `switch` abuse. Hence my question about its necessity.

drivers99 · on Feb 2, 2017

If the number of options (CMD5, CMD6, etc) changes, he needs the code to not break if he adds them to the enum but doesn't doesn't extend the switch statement.

pdpi · on Feb 2, 2017

Not quite the same as Rust, but at least both GCC and Clang offer `-Wswitch-enum` which will cause non-exhaustive switches on enums to be raise warnings.

pdpi · on Feb 2, 2017

the ifs allow the code to function correctly even if you forget to update this function with newly-added cases in the enums. The switch statement makes the known-cases behave O(1), while also bypassing unneeded validation.

akkartik · on Feb 2, 2017

The known cases are O(1) even with an `if` -- all it's doing is a range check. Any performance benefit from this hack is miniscule at best. It may even end up confusing a compiler and making performance worse.

pdpi · on Feb 2, 2017

Sure. I'm not arguing it's a good idea. It just serves as (somewhat forced) motivation for showing how to exploit this behaviour with switch labels, which is the real point of the post.

nialv7 · on Feb 3, 2017

It always amaze me how many people is 'discovering' the C switch statement for the first time.

posedge · on Feb 2, 2017

well, I dont see the beauty :)

adekok · on Feb 2, 2017

There's just no reason to have a web site that goes down when HN viewers go to it.

I run wiki.luajit.org on a VM, using Gollum. We got hit by HN a while back. CPU went from 1% to 2%.

People saying "CPU is commodity" tend to write software which uses all available CPU... for little purpose.

Heck, I worked at a company which did just that. "CPU / memory / disk is commodity, use it." And they did! Soon enough, all CPU / memory / disk was in use, and they had to go back and re-architect their software so that it wasn't crap.

It would have been cheaper to do it right in the first place. But religious beliefs about engineering over-rode actual engineering.

GFK_of_xmaspast · on Feb 2, 2017

This is a whole stack of horrible code under the banner of "might be more efficient" with no actual examination of what the code actually gets compiled into, much less any profiling.

anjc · on Feb 2, 2017

Cool trick, but difficult to grasp on first glace. I'd personally rather see pages of 'if' statements if it makes it easier to reason about.

neoeldex · on Feb 2, 2017

He's got a good point: Why is C still the staple it is? But I believe that Rust is going to take it's place. Not sure whether it's a good thing, but this trick isn't possible in rust ;)

Rust would give an error when an enum isn't exhausted in a match clause. So the issue being described doesn't exist :D

pdpi · on Feb 2, 2017

It'll be a while still before Rust makes its way to all the platforms with a C compiler (if it ever even happens).

As to why it's still a popular language... The C99 spec document is shorter than the Ecma-262 (Javascript) spec by around 50 pages. The parts dedicated to the _languages_ themselves (as opposed to the builtins/standard libraries) weigh in at 130 pages for C, or around 300 pages for JavaScript. C, despite its warts, is a small, simple language.

catnaroek · on Feb 2, 2017

How much longer would the C specification be if it were fully formalized? And even then, a formalization isn't enough: it has to be useful for actually proving theorems, e.g. “such and such pattern is portable across conforming implementations”, and the size of the proof better not be much larger than the size of the program itself.

If the C standard is so useful, try programming using the C standard as your sole debugging tool.

drivers99 · on Feb 2, 2017

> Rust would give an error when an enum isn't exhausted in a match clause. So the issue being described doesn't exist :D

I was going to say something like that, but the equivalent code would have a "_" case in the match statement. I don't know rust yet, but would a rust programmer typically make a point to not use a "_" (default) statement so that you can catch these at compile time and expect you'll never have anything else?

geofft · on Feb 2, 2017

It's not just typical, it's required. If you add a default match when you've already specified all the possible cases, the compiler gives you an error. Here's an example (https://is.gd/KjSc8B):

    enum Foo {
        Bar(i32),
        Baz(i32),
    }

    fn matchfoo(f: Foo) -> i32 {
        match f {
            Foo::Bar(x) => x,
            Foo::Baz(x) => x,
            _ => 42
        }
    }

gives "unreachable pattern", a hard error, for the "_" case.

ordu · on Feb 2, 2017

You can also avoid using `default' statement in C, and use gcc -Wall while compiling. You'll get warning for missing enum values in `case' statements. You even can add -Werror and get errors when some enum values left unused.

But when you have a really big enum and need to handle only small subset of it's values it didn't work.

When _this_ problem can be avoided in Rust, it can be avoided in C also. The only Rust bonus here is compiler default behaviour.

johnbellone · on Feb 2, 2017