Hacker News new | past | comments | ask | show | jobs | submit login

Related, 8086tiny, a 2013 IOCCC winner. About twice as much source code, and if not cheating then certainly leaning heavily on the rules, but very impressive as it's a full x86 emulator with graphics and several peripherals.

Original entry: https://www.ioccc.org/years.html#2013_cable3

"Deobfuscated": https://github.com/adriancable/8086tiny

(Not mine, just a fan)




From the description, one small part of the code:

--64[T=1[O=32[L=(X=*Y&7)&1,o=X/2&1,l]=0,t=(c=y)&7,a=c/8&7,Y]>>6,g=~-T?y:(n)y,d=BX=y,l]

The difference between bf and c sometimes is small


It is in fact quite reasonable if you know a bit of C obfuscation and some expectation about 8086 interpreters. First, indentations:

    --64[
        T=1[
            O=32[
                L=(X=*Y&7)&1,
                o=X/2&1,
                l
            ]=0,
            t=(c=y)&7,
            a=c/8&7,
            Y
        ]>>6,
        g=~-T?y:(n)y,
        d=BX=y,
        l
    ]
`a, b` will return `b` but evaluate `a` as a side effect, so anything preceding `,` can be (recursively) pulled out.

    L=(X=*Y&7)&1,
    o=X/2&1,
    O=32[l]=0,
    t=(c=y)&7,
    a=c/8&7,
    T=1[Y]>>6,
    g=~-T?y:(n)y,
    d=BX=y,
    --64[l]
`a[b]` is semantically equivalent to `*(a + b)`, so it is unexpectedly commutative. Let's swap them all and add some spacing. Ah, also some assignment expressions have been reused as a part of other expressions, which I will also pull out:

    X = *Y & 7,
    L = X & 1,
    o = X / 2 & 1,
    O = l[32] = 0,
    c = y,
    t = c & 7,
    a = c / 8 & 7,
    T = Y[1] >> 6,
    g = ~-T ? y : (n) y,
    d = BX = y,
    --l[64]
Now it should be much more clear that a large chunk of this code is instruction decoding. At this point we need to expand macros to make more sense out of it:

    #define Z short
    #define a(_)*(_*)&
    #define y a(Z)Y[++O]
    #define n char
It would need more reading to determine the exact meaning of `Y`, but given that its type is `unsigned char*`, it should be some sort of pointer, most likely the instruction pointer. The macro `y`, which expands to `*(short*)&Y[++O]`, increases `O` so it looks like reading from IP+0, IP+1, ... while not actually advancing the IP.

So the mysterious code can be now mostly decoded:

    X = *Y & 7,              // [IP+0] = ?????XXX
    L = X & 1,               //        = ???????L
    o = X / 2 & 1,           //        = ??????o?
    O = l[32] = 0,           // Reset `O` for further uses of `y`

    c = y,                   // [IP+1] = yyyyyyyy
    t = c & 7,               //        = ?????ttt
    a = c / 8 & 7,           //        = ??aaa???
    T = Y[1] >> 6,           //        = TT??????
    g = ~-T ? y : (char) y,  // Read [IP+2] and do something

    d = BX = Y[++O],         // [IP+3] = dddddddd
    --l[64]
[IP+1] surely looks like the ModR/M byte, so you can guess that `g` is a displacement even if you don't know what the heck is `~-T` (equals to `T - 1` in twos' complement). [IP+0] is too generic to pick the exact instruction, but my guess is that this code is shared for a lot of similar opcodes to save the code footprint.


Thank you. That is interesting!


> if not cheating then certainly leaning heavily on the rules

How so?


The description implies they found some clever way to trick iocccsize at the time. Using the current version of iocccsize, it weighs in at 3842, well over the max of 2053.

https://www.ioccc.org/2013/cable3/hint.html


No, `iocccsize` has been updated throughout the history of the contest. The hints mention this, but the accepted secondary size for `cable3.c` was 1,977 tokens. (Later versions of `iocccsize` don't compute secondary sizes at all.)

As far as I can tell, the actual abuse here was the inclusion of instruction decoding table in the custom BIOS which didn't count towards the total size at all:

> This conversion, along with instruction lengths and how each instruction modifies the flags, is assisted by some lookup tables which form part of the BIOS binary.


The BIOS file doesn't just contain what you'd expect from a normal PC BIOS, but also contains several tables used by the emulator for decoding of x86 instructions. Normally those tables would be in code as they're necessary regardless of BIOS implementation; they're presumably in the BIOS to save space.


Maybe one way to look at it other than "cheating" is that it's simulating a fictional x86-like processor which uses an external ROM for microcode.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: