A primer on some C obfuscation tricks

nneonneo · on April 23, 2020

I publicly reverse-engineered a cute bit of C code many years ago that employs some of these tricks:

https://stackoverflow.com/questions/15393441/obfuscated-c-co...

It's a digital clock (which has to be compiled and run once per second to be accurate).

Of course, if you're interested in obfuscated C, you can't miss the International Obfuscated C Code Contest, which is where most of these evil tricks show up. Submissions for this year's IOCCC are still open: https://www.ioccc.org/

The IOCCC has been running since 1984, and there are some absolutely marvelous gems: https://www.ioccc.org/years.html. A great rabbit-hole to dive down if you're stuck at home ;)

moonchild · on April 23, 2020

  ({_:&&_;});

  ({});

  ({;});

All of these rely on a gcc extension ('statement expressions').

________________________________

  a = '-'-'-'

You can also do:

  a = '/'/'/'

To generate a 1 instead of a 0.

________________________________

  printf("%d %d\n", 0 == sizeof(count = 2, count++), count);

This works because:

1. There are two forms of sizeof; sizeof(T) where T is a type, and sizeof x where x is an expression.

2. There are 'comma expressions'; if you have (x, y) where x and y are expressions, then x is executed and the expression evaluates to y.

3. Parameters to sizeof are not evaluated (which is important because otherwise the value of the 3rd argument would be undefined, since there's no sequence point between evaluation of function parameters).

saagarjha · on April 23, 2020

> Parameters to sizeof are not evaluated

Oh, but they are ;) Try running sizeof on a VLA sometime.

_kst_ · on April 23, 2020

The argument to sizeof is evaluated if and only if it's of a variable length array type.

(If it's a parenthesized type name, it's unclear what it means to "evaluate" it.)

userbinator · on April 23, 2020

Number 24 on the list shows exactly this.

sigjuice · on April 23, 2020

  $ cat sizeof.c

  #include <stdio.h>
   
  int main(int argc, char *argv[])
  {
      printf("%zu\n", sizeof(int[argc - 2]));
  }

  $ make -B sizeof
  cc     sizeof.c   -o sizeof

  $ ./sizeof foo
  0

  $ ./sizeof
  17179869180

saagarjha · on April 23, 2020

This can yield surprising results even without undefined behavior. For example: https://godbolt.org/z/fwhLH4

sigjuice · on April 23, 2020

This is what I see visit your link. I think this is easily explained. What might I be missing?

  ASM generation compiler returned: 0
  Execution build compiler returned: 0
  Program returned: 44
  ./output.s

PS: I did not examine the assembly code

saagarjha · on April 23, 2020

It calls printf. Perhaps this is a better example: https://godbolt.org/z/fwhLH4. (I updated the link above, too.)

saagarjha · on April 23, 2020

Is it bad that I've written some of these in my normal code? In particular, I use the concepts in these when they're relevant:

  *(c ? &x : &y) = v;

  return (char *[]){"No", "Yes"}[!!x];

(Note: for the second one, I use the "selecting from a temporary array" and !! separately; if I only had two options then I'd use x ? "Yes" : "No" of course.)

userbinator · on April 23, 2020

I don't find those obfuscatory at all either, and instantly understood them upon first reading. I think they are a form of "data-oriented" style of programming, where array indexing and selection via ?: are preferred over "code-oriented" control flow statements. In my experience, the former actually tends to create more concise and maintainable code, since it often means a lot of the logic becomes table-driven and easily modified. The latter tends to result in very long and "branchy" code with lots of if/else statements that ultimately contain tiny bodies that don't do much.

Combining multiple booleans into an integer and using that as a switch or index is another related technique.

To modify an old saying slightly, perhaps "one person's obfuscation is another person's simplicity and elegance."

jackewiehose · on April 24, 2020

> I don't find those obfuscatory at all either, and instantly understood them upon first reading

I understood the whole list of these obfuscation tricks upon first reading but its still obfuscation.

Creating an array instead of an if-else-switch like the second line is fine but for just two elements and in combination with !!x (and !!x in general) its just nonsense and doesn't help at all.

I'm all for code-density and avoiding repetition so I too use ?: whenever I can but not on the left side of an assignment. That's just an obfuscated if-else-statement. If normal code results in very long and "branchy" code you can rearrange your code in a better way.

gpderetta · on April 24, 2020

FWIW, !! is a fairly idiomatic way to cast a value to bool.

jackewiehose · on April 24, 2020

Ints are automatically converted to bool, there is no need to cast. What this does is it restricts the int to 1 or 0. The clean way to write this is x ? 1 : 0. This is understandable even if you don't know the Boolean inversion of 0 is 1 (Which isn't self-evident. For example in Basic its -1).

gpderetta · on April 24, 2020

They are converted to bool in bool context. Outside of that you have to force it.

   int countNonZero = 0;
   for (int i = 0; i < values.size(); ++i) {
       countNonZero += !!values[i];
   }

Also c++ has explicit operator bool();

edit: not a great example as values[1] >0 would be better.

jackewiehose · on April 24, 2020

You don't have to enforce a conversion from int to bool if you just want a bool. In this example you don't want to increment by 'bool', you want to increment by 1 or 0 (which are ints). So you use !!x to convert to bool and back to int.

The reason I'm pointing this out is that often enough I had to work with code where this distinction was unknown to the author and you come across code like if (!!x) which is, again, just nonsense.

userbinator · on April 25, 2020

edit: not a great example as values[1] >0 would be better.

Only if that's an array of unsigned ints, as otherwise the code would count only positive nonzeros.

gallier2 · on April 24, 2020

You can also typecast to bool since C99. (bool)x if you include <stdbool.h> or (_Bool) if not.

kabdib · on April 24, 2020

I try to write C/C++ so that the state and flow of control is fairly obvious in a debugger. Complicated and "electric" statements like the ones above are succinct and maybe even obvious, but they're difficult to step through, and they don't really offer any advantage in code quality.

Screen space is cheap. Say what you mean. Clever today is usually a headache tomorrow.

clarry · on April 24, 2020

I can fit over 500 lines of code on my screen but I still prefer dense code. There's just something nice about not needing to jump around so much (whether with the eyes or the cursor or file), and being able to see more of the forest at once.

jackewiehose · on April 23, 2020

> Is it bad that I've written some of these in my normal code?

yes

Nevermark · on April 24, 2020

Well, that depends on context. Always insure your code is good in the context it appears.

    yourCode = bad;

    temp = bad;
    bad = good;
    good = temp;

    conclusion = (yourCode == bad)

As we can see from evaluating this code, in this context, it isn’t bad.

In general, if something is bad you can change that thing or change the context. Your job, your living arrangements, etc. People often are slow to change the context, even when a small change can make all the difference.

You might want to create an include file if you have a lot of code to check.

mycodeisalwaysgood.h seems self-explanatory.

m463 · on April 23, 2020

I wonder if even replying is encouraging this behavior.

FpUser · on April 24, 2020

I can understand the code and if all you function does is this 2 statements I guess it is ok. Myself however I would not write any code of meaningful size in this style.

I switch between many languages often enough so anything even remotely esoteric gets forgotten instantly and causes my brain to pause at suspicious line and making me think too much about the trees instead of the forest.

If however I was only using C and nothing else maybe I'd be catching this bug myself, one never knows ;)

bear8642 · on April 24, 2020

>*(c ? &x : &y) = v;

Do you know why need to use pointer magic here?

Tried (c ? x : y) = v; and wondering why that doesn't work.

gallier2 · on April 24, 2020

because ternary operation is a Rvalue only in C. In C++ and as gcc extension it can be used as Lvalue, but not in regular plain C.

bear8642 · on April 23, 2020

>return (char *[]){"No", "Yes"}[!!x];

Is !!x equivalent to x here?

saagarjha · on April 23, 2020

No, it "collapses" an integer to 0 or 1. (Essentially, it's x != 0.)

asveikau · on April 23, 2020

Personally I prefer (x ? 1 : 0) over (x != 0) or !!x for this sort of thing.

It fits semantically with "I'm using x as a boolean, but it may not be 1".

uryga · on April 24, 2020

i'm on the fence about this... i don't love type coercions, but (!!x) is nice because you don't have to stop and check - was it (x ? 1 : 0) or (x ? 0 : 1), (x !=0) or (x==0)? i think arthur whitney said something in the vein of "less characters - less room for typos" :)

gallier2 · on April 24, 2020

!!x is idiomatic. (bool) cast is modern.

I use !=0 in only one context, if(strcmp(a,b) != 0) to check if two strings differ. Normally if an if() expression I don't check explicitly for 0 or not 0, except for strcmp class of functions, because of the ordered comparison of it (<0, 0, >0) , its "boolean" value is inverted, so somewhat counter intuitive if used with implicit boolean values.

asveikau · on April 25, 2020

I don't think !! can be too common an idiom when the one time I recall using it in a corporate setting it was pointed out as absurd. And I agree it's not explicit and clear. But this seems a matter of style.

Cast to bool has me extremely nervous because ... It works in C99's _Bool type, but I have lots of bad memories of security bugs when using this approach with pre-standard custom bool types, which are still in common use including but not limited to Win32 and COM, or Objective-C.

The bug I am thinking of looks like this:

    long long ll = 1LL << 32;
    bool b = ll;

    puts(b ? "yes" : "no");

I just tested to confirm. This prints "yes" when using C99 stdbool, presumably because they standardized it to work. If you change bool to a common choice for pre-standard bool typedefs (int or char, say), it prints "no" because the large nonzero value doesn't fit in that type. "Just use the standard type" you might say. But if we were working somewhere that required legacy nonstandard bools that is not our choice, and further, a later refactor on may start using those other types some day even if we make the correct choice today. So there is an argument to be consistent about avoiding this type of bug.

asveikau · on April 23, 2020

If x is any nonzero value [example: 2], !!x is 1.

Any nonzero expression is true, and 0 is false. But the result of !, or ==, ||, or any of those other boolean operators is 1 for true.

Recall relatedly that C didn't have a bool type until the 1999 standard, which introduced <stdbool.h>. C++ had it earlier, but for a long time the C way to store a boolean expression into a variable might have been int, or some nonstandard typedef. This is why so many libraries and frameworks in the C world define their own boolean type.

ramshorns · on April 24, 2020

Hey, I've got one! Use the exponent operator.

  if (2^3 == 8)
      puts("two cubed is eight");
  if (5^2 == 25)
      puts("five squared is twenty-five");

bear8642 · on April 24, 2020

That's Bitwise Xor not exponentiation.

2^3 => 1

5^2 => 7

ramshorns · on April 24, 2020

It works though! And if you do

  if (2^3 != 1)
      puts("two cubed is not one");
  if (5^2 != 7)
      puts("five squared is not seven");

it prints those messages too.

The trick is operator precedence.

bear8642 · on April 24, 2020

Cool - thought had operator confused

faehnrich · on April 23, 2020

That index[x] was used in one of my favorite IOCCC one-liners.

http://faehnri.ch/have-fun/

xnhbx · on April 24, 2020

From bulletin #4: "-2147483648 is positive. This is because 2147483648 cannot fit in the type int, so (following the ISO C rules) its data type is unsigned long int. Negating this value yields 2147483648 again."

This is not true. If a decimal integer constant value cannot be represented in type "int", the next candidate type is "long int". If the value cannot fit in "long int" either, the next type to try is "long long int" in C99 and "unsigned long int" only in C89.

inetknght · on April 23, 2020

Today I learned... _magic_

    #include <cstdio>
    
    void sw(int s)
    {
        switch (s) while (0) {
            case 0:
                printf("zero\n"); continue;
            case 1:
                printf("one\n"); continue;
            case 2:
                printf("two\n"); continue;
        }
    }

[0] https://gcc.godbolt.org/z/Q26LWG

MauranKilom · on April 23, 2020

Hey wait, that's just Duff's... Oh.

lostmsu · on April 23, 2020

Huh, what does it do?

strbean · on April 23, 2020

Looking at the link, it generates the exact same assembly (and behavior of course) as

    void sw(int s) noexcept
    {
        switch (s) {
            case 0:
                printf("zero\n"); break;
            case 1:
                printf("one\n"); break;
            case 2:
                printf("two\n"); break;
        }
    }

lostmsu · on April 24, 2020

Is noexcept here the consequence of while? Or noexcept simply does nothing at all?

Not a C expert, so just curious.

eMSF · on April 24, 2020

AFAIK noexcept is a C++ specifier that isn't valid C. Even so, it has nothing to do with the loop, but in C++ it would cause the program to terminate if an exception occurred in the sw function.

saagarjha · on April 23, 2020

thatsthejoke.jpg

strbean · on April 23, 2020

I mean, parent seemed an awful lot like someone asking what it did. Damn me for missing the invisible /s. I'm sure no other reader would ever be curious!

saagarjha · on April 23, 2020

It's not the comment that is the joke, but the example in the context of the "C obfuscation tricks" article where it's just a convoluted way to write the same thing as a simple C construct.

inetknght · on April 25, 2020

Yup. Of all of the examples on that page, this is the only one new to me. I thought it pretty clever and after toying with it, it does seem pretty cool with functions as the conditionals.

Sharlin · on April 24, 2020

    > return (char *[]){"No", "Yes"}[!!x];

I prefer

    return (!x<<2)+"Yes\0No";

gallier2 · on April 24, 2020

clang will warn as it doesn't like int+string constructs.

&"Yes\0No"[!x>>2]; // same number of characters

to shut it up.

bear8642 · on April 24, 2020

Clever! Though feel having "No" relate to x = 0 is slightly clearer

Sharlin · on April 24, 2020

But surely the goal here is the opposite of clarity? :)

jackhalford · on April 23, 2020

> x[index] is *(x+index) > index[x] is legal C and equivalent too

Can't be unseen, I can't believe I never thought of that.

_kst_ · on April 23, 2020

C could have permitted pointer+integer and banned integer+pointer, which would have made the indexing operator non-commutative. Nothing important (IMHO) would have been lost. But the equivalence goes back to the days when the distinction between integer and pointers was not always clear.

guitmz · on April 23, 2020

I find that after learning assembly language, things like this become very obvious when seen in different languages, specially C

roywiggins · on April 23, 2020

The main reason that works is historic, I think.

It could just as easily throw a compilation error to index a constant with an array rather than the other way around. I don't think this works in Rust even if the resulting machine code for array indexing is the same.

eMSF · on April 23, 2020

The main reason is that in C, "indexing" an array is purely syntactic sugar for pointer arithmetic, which itself is commutative; that is. ((A)[B]) is equivalent to ((A)+(B)), which itself is equivalent to ((B)+(A)) (assuming one of them has an integral type and the other a pointer to complete object type).

Now, of course an array type isn't a pointer type, but as "indexing" isn't one of the very few cases where an expression that has an array type isn't converted to an expression with a pointer type, you aren't really indexing an array, but a pointer to its first element.

wahern · on April 23, 2020

Another way to look at it is that C syntax was designed to be extremely simple to parse, and C semantics to simplify code generation. Early C compilers immediately generated code as they parsed each expression, keeping minimal state. (No AST!) Also consider that in B the only data type was the machine word, so the type of the operands were irrelevant to the code you generated. In early C the biggest difference was structures, which required some minimal bookkeeping (very minimal when all members were in the same namespace), but a struct dereference is just syntactic sugar for an (address + offset) expression, so underneath the covers the compiler was still just chewing through identifiers, left to right, and emitting simple assembly for addition and multiplication, because each identifier was just a symbol for an integer.

So index[array] isn't an historical accident. It might not have been deliberate, but it follows naturally from the nature of the language.

Go very much follows the same discipline. Speed and simplicity of compilation constrain the syntax, most notably the lack of generics. Goroutines, channels, etc, only require minimal syntactic and compiler support. Contrast that with Rust--Rust front loads everything into the parsing phase--lifetimes, async, etc. Deep AST analysis and transformation is everything for Rust. Of course these days people abhor even the possibility of allowing something like index[array], so even a compiler like Go goes out of its way to disallow it.

saagarjha · on April 23, 2020

I see no reason why this should be obvious; it's just a historical quirk of array indexing in C being a literal translation to pointer arithmetic. In C++, for example, there is no way to support this with the subscript operator.

_kst_ · on April 23, 2020

> or since int is default type

Not since the 1999 ISO standard.

> int main(){ return linux > unix; }

A conforming compiler must diagnose "linux" and "unix" as undeclared identifiers. Many C compilers are not conforming by default.

ben0x539 · on April 24, 2020

I suspect the post is about GNU C rather than ISO C.

fortran77 · on April 23, 2020

I wonder when there'll be the Obfuscated Rust Programming Contest.

gallier2 · on April 24, 2020

No need for that, any Rust program is obfuscated.

steveklabnik · on April 24, 2020

We held an “Underhanded Rust” contest, which is similar but different, a few years back.

BubRoss · on April 23, 2020

They might need an obsfuscated compilation contest.

dvfjsdhgfv · on April 24, 2020

I like if (val && ~val) as an alternative to if(val) based on the fact that for non-zero ints ~val will also be non-zero.

waltpad · on April 24, 2020

That probably shows (once more) that the C language really needs an overhaul, with a stricter grammar disallowing that sort of tricks.

Or maybe not.

platinumrad · on April 24, 2020

This is a valid Rust program:

  fn main() {
      return return return return return return return
  }

You can do silly things in any language.

waltpad · on April 24, 2020

Indeed, anyone can also write code over-abusing lambdas in a functional language, but lambdas are also quite useful in the majority of cases. On the other hand, can you point out a single situation where swapping int and typedef in a type definition brings anything good?

Likewise, in your example, I doubt that this could bring anything. Is there a practical obfuscation method based on that quirk in Rust, or a reason for keeping it in the syntax? I cannot tell, but maybe you can?

Overall, the problem that I have with this is not that it is silly, it is that it makes it harder to understand and maintain. In some cases, it requires active engineering to fix the issue, but the language should be designed so that most of these problems are taken care of by design.

Also, many people seem to enjoy the fact the C can be bent that way. I don't mean to remove that from them, I just think that for system programming, it should be less permissive. Perhaps a 'strict mode' could be devised, not at the syntax level like in javascript (which I suppose couldn't be avoided), but as compiler flag (like the c++ people did it).

kccqzy · on April 24, 2020

One guy's useless tricks can be another's essential flexibility. Especially when you consider code generation.

waltpad · on April 24, 2020

> Especially when you consider code generation.

That's a very good point you are making here.

However, as I said elsewhere, I don't see the use of being able to either write

typedef int ...

or

int typedef ...

and I don't think that proper code generation would require it.

anthk · on April 24, 2020

[flagged]

waltpad · on April 24, 2020

I don't know Forth, Perl, or even Rust enough to discuss them on that basis, but having other languages allowing it doesn't mean that C should be lenient about it either, or does it?

Now, don't get me wrong, I do not dislike C, I've practiced it a lot, and I come back to it from time to time when I need to. But I would rather use a language where many of these forms are forbidden for readability and maintainability.

Nevermark · on April 24, 2020

I keep my code interesting, to make work more enjoyable for future maintainers, by keeping all my code ...

inline with_all_ColinIanKing_standards()

freefal · on April 23, 2020

What does the provided example do?

5) Surprising math:

int x = 0xfffe+0x0001;

monocasa · on April 23, 2020

It doesn't compile.

It's trying to parse the e as scientific notation I think.

  surprising_math.c: In function ‘main’:
  surprising_math.c:4:10: error: invalid suffix "+0x0001" on integer constant
    int x = 0xfffe+0x0001;
            ^~~~~~~~~~~~~

inetknght · on April 23, 2020

Indeed, I think you're exactly right. Changing it to `0xffff+0x0001` lets it compile.

shaklee3 · on April 24, 2020

Doesn't that defeat the purpose then? It's no longer using scientific notation.

gallier2 · on April 24, 2020

The initial intent was to add 2 hex values, but using 0xfffe tripped the parser.

moonchild · on April 23, 2020

It looks a bit like a binary exponent, but according to clang and gcc it's erroneous: 'invalid suffix '+0x0001' on integer constant'.

rightbyte · on April 23, 2020

That is really stupid. How is that not a bug? I can't belive GCC wont compile it. What does other compilers do.

Edit: Clang also gives an error. Mscv seems to compile. I wonder who follows spec. I assume not mscv ...

_kst_ · on April 23, 2020

It's not a bug. 0xfffe+0x0001 satisfies the syntax of a "preprocessing number". In a later translation phase, preprocessing numbers are converted to numeric constants. Not all valid preprocessing numbers are valid numeric constants. The syntax of preprocessing numbers is deliberately permissive to avoid complicating the implementation of the preprocessor.

I'm sure it could have been made stricter, allowing 0xfffe+0x0001 to be treated as 0xfffe + 0x0001 -- but the solution is simply to write 0xfffe + 0x0001 in the first place. The language grammar has to be consistently defined even if it leads to surprises now and then.

inetknght · on April 23, 2020

It produces an error message.

error: invalid suffix '+0x0001' on integer constant

[0] https://gcc.godbolt.org/z/pR5L6v

TimSchumann · on April 23, 2020

Thanks for sharing this!

m463 · on April 23, 2020

how about this one?

  int main() {
    fork();
    printf("choo");
  }

saagarjha · on April 23, 2020

What's the trick (besides calling functions after fork that you shouldn't be calling?)

oddlama · on April 24, 2020

I think parent wanted to write the following:

int main() { printf("choo"); fork(); }

Here, "choo" can be printed twice, even though we fork after printing. This is a result from line buffering when the flush happens after the process is forked. Essentially, the output buffer is copied when forking and therefore duplicated.

m463 · on April 25, 2020

oh yes, that's what I meant! your example and mine print the same output.