C72: A port of the earliest C compiler to x86 (2021)

hnlmorg · on April 10, 2022

> compound assignment operators are reversed, they are =+, =*

Personally I always thought that made more sense than the (now) standard way because when you read it aloud it reads like the operation you’re performing:

  N equals N plus five

  N =+ 5

I still find myself occasionally writing those operators that way around even after decades of using C and similar languages.

ghewgill · on April 10, 2022

One of the problems with the old notation is the ambiguity. What does each of the following mean?

    n =- 5
    n = -5
    n=-5

adrian_b · on April 10, 2022

Dennis Ritchie, in his paper about the history of C, explains that they became aware of these ambiguities only after the first C implementation, but they considered that it was a too big mistake, so they reversed the order in the next C version.

Taniwha · on April 11, 2022

maybe not quite the next version, from memory (and I ported compilers, and the kernels, for both) it was between v6 and v7 Unix

dn3500 · on April 11, 2022

That's correct, it wasn't the next version of the Ritchie compiler, it was a whole new compiler, the Portable C Compiler. It used yacc to generate the parser, and yacc complained about the ambiguity. They left the old syntax in for quite a while.

Taniwha · on April 11, 2022

Yup, almost certainly PCC - which also still has struct fields in a global namespace as a (possible wonderful, possibly horrible) hack to provide the equivalent to unions

mhh__ · on April 10, 2022

If I were designing a new language and where dead-set on the =- form I would make it so:

    n =- 5 // n equals n - 5
    n = -5 // n equals -5
    n=-5   //Illegal

This can be done, it just requires getting the lexer right.

jandrese · on April 10, 2022

Another problem:

n =* p;

Is p a pointer or an integer? Maybe you want to do pointer math, it's a very C thing to do.

jamesfinlayson · on April 11, 2022

That was my thought too - seems like it would be ambiguous even with type information anyway as the readme says this iteration at least, any type can be converted to any other type.

moonchild · on April 11, 2022

The modern notation contains a very similar 'ambiguity'. Compare: x---y, x-- - y, and x - --y.

mburee · on April 11, 2022

And sadly x-----y won't parse :(

krylon · on April 11, 2022

It's been a while I used C, but I for one am kind of glad that doesn't parse. If you need to do this, you can use parentheses:

    x - (--y)

or

    (x--) - y

tom_ · on April 11, 2022

All parse fine, and unambiguously, due to the maximal munch rule.

x---y is (x--)-y, which is semantically valid.

x-----y is ((x--)--)-y - but it's a semantic error, as x-- is not an lvalue.

hnlmorg · on April 10, 2022

Ahh yes, that’s a very good point.

jamesfinlayson · on April 11, 2022

> no for loops

Oh wow, I've read bits and pieces about early iterations of C but I didn't realise for loops weren't there from the start.

edgyquant · on April 11, 2022

For loops are syntactic sugar for while loops, e.g.

    for(int i = 0; i < 10; i++);

Is just

    int i = 0;
    while(i < 10)
       i++;

iib · on April 12, 2022

I think there is also an extra scope embedded in the `for`. So enclosing it would be closer:

    {
        int i = 0;
        while(i < 10)
            i++;
    }

kbelder · on April 11, 2022

Honestly, I think that's cleaner.

dataflow · on April 11, 2022

Keep in mind they're not always equivalent. If you have 'continue' in the middle then the for loop will still perform the increment but the while loop won't.

google234123 · on April 11, 2022

This bites me a few times a year.

andai · on April 11, 2022

Here's a thought: some languages have a defer statement that runs when the function is exited. This allows for cleanup code to run even with early return. Perhaps a block-level defer could be useful:

    int i = 0;
    while (i < 10) {
        defer i++;
        // ...
        continue;
        // ...
    }

AlbertoGP · on April 11, 2022

If someone wants to play with this idea, I made a pre-processor that translates that into standard C, inserting the deferred statements at each exit point (break, continue, goto, return, end of block):

https://news.ycombinator.com/item?id=28166125

https://sentido-labs.com/en/library/cedro/202106171400/#defe...

andai · on April 11, 2022

Hahah, what are the odds! Thanks for sharing, I will take a look.

dataflow · on April 11, 2022

IMHO the way to avoid it is to avoid nonlinear [1] control flow. If you can convince yourself that early returns, continue, etc. are harmful, you'll avoid the pitfalls that come with them. It's an unpopular position to take, though.

[1] I might be sloppy with the terminology here, but you catch my drift.

jraph · on April 11, 2022

I do think early returns, continue, etc are a good source of clarity. Sometimes, code that avoids early returns seems convoluted and can use more variables, be more verbose and require more effort to be understood. In for loops, you need to understand that this complicated condition is there to replace a break. Not my cup of tea.

dataflow · on April 11, 2022

Yeah I get that, that's always the argument. I'm not an absolutist on this either (I do sprinkle continue/return into my code once in a while when it really seems worth it; even the same with the occasional goto), but in general I feel people are way more liberal with these than they should be (except goto, which people avoid at all costs no matter how complicated the code becomes, thus erring toward the other extreme). Never been able to convince many others though; the arguments never get anywhere when person A insists X is more readable and person B insists Y is more readable.

avar · on April 11, 2022

The use of "goto" is fine and more legible in some contexts, such as jumping to the "cleanup" part of a function.

You can avoid it, but it'll involve one of repetition, a wrapper function, or "formalizing" the cleanup so that it's your only cleanup call. Depending on the situation those are all going to be way more verbose.

Some advanced use cases include jumping between "case" labels, or jumping into the middle of a loop. Most of the time that's bad practice, but in the cases that benefit from them it's much better than the alternatives.

jraph · on April 11, 2022

At the end, readability depends heavily on habits.

You can know that, but you feel it when you work on various codebases with different people / switch teams.

So, yes. Nobody is right or wrong in these topics. Moreover, there are situations where one is clearly better than the other.

If you need to (formally) verify your code, depending on the method, maybe early returns are to be avoided.

legalcorrection · on April 11, 2022

The problem is that the i++ goes at the end of the loop body. When the loop body is long, the increment can be many many lines away from the initialization and loop condition. The increment won't always be as obvious as "i++" and can involve function calls. With a well-written for loop, you can see all of that at a glance.

zozbot234 · on April 11, 2022

> With a well-written for loop, you can see all of that at a glance.

Yes, this is ultimately about the push for structured programming, which involves creating a pattern language of loops with "clean" semantics. Grouping the loop advancing statement with the other parts of a general loop construct helps wrt. recognition of known looping patterns.

WalterBright · on April 11, 2022

The D compiler simply rewrites while loops into for loops, that way I only had to write one semantic routine for the for loop.

edgyquant · on April 11, 2022

In this example maybe, but in practice the for loop looks cleaner (imo obviously.)

shric · on April 11, 2022

As this compiler is written in modern C, I guess the next logical step is to write a version that can compile itself?

gotaquestion · on April 10, 2022

This is a great study: the code isn't that large, and it is rather straightforward compared to looking at today's GCC. Thanks for putting this up!

userbinator · on April 11, 2022

If you want to study a tiny "not quite C" compiler, I recommend C4:

https://news.ycombinator.com/item?id=22353532