Hacker News new | past | comments | ask | show | jobs | submit login
C72: A port of the earliest C compiler to x86 (2021) (github.com/vegesm)
127 points by marcodiego on April 10, 2022 | hide | past | favorite | 35 comments



> compound assignment operators are reversed, they are =+, =*

Personally I always thought that made more sense than the (now) standard way because when you read it aloud it reads like the operation you’re performing:

  N equals N plus five

  N =+ 5
I still find myself occasionally writing those operators that way around even after decades of using C and similar languages.


One of the problems with the old notation is the ambiguity. What does each of the following mean?

    n =- 5
    n = -5
    n=-5


Dennis Ritchie, in his paper about the history of C, explains that they became aware of these ambiguities only after the first C implementation, but they considered that it was a too big mistake, so they reversed the order in the next C version.


maybe not quite the next version, from memory (and I ported compilers, and the kernels, for both) it was between v6 and v7 Unix


That's correct, it wasn't the next version of the Ritchie compiler, it was a whole new compiler, the Portable C Compiler. It used yacc to generate the parser, and yacc complained about the ambiguity. They left the old syntax in for quite a while.


Yup, almost certainly PCC - which also still has struct fields in a global namespace as a (possible wonderful, possibly horrible) hack to provide the equivalent to unions


If I were designing a new language and where dead-set on the =- form I would make it so:

    n =- 5 // n equals n - 5
    n = -5 // n equals -5
    n=-5   //Illegal
This can be done, it just requires getting the lexer right.


Another problem:

n =* p;

Is p a pointer or an integer? Maybe you want to do pointer math, it's a very C thing to do.


That was my thought too - seems like it would be ambiguous even with type information anyway as the readme says this iteration at least, any type can be converted to any other type.


The modern notation contains a very similar 'ambiguity'. Compare: x---y, x-- - y, and x - --y.


And sadly x-----y won't parse :(


It's been a while I used C, but I for one am kind of glad that doesn't parse. If you need to do this, you can use parentheses:

    x - (--y)
or

    (x--) - y


All parse fine, and unambiguously, due to the maximal munch rule.

x---y is (x--)-y, which is semantically valid.

x-----y is ((x--)--)-y - but it's a semantic error, as x-- is not an lvalue.


Ahh yes, that’s a very good point.


> no for loops

Oh wow, I've read bits and pieces about early iterations of C but I didn't realise for loops weren't there from the start.


For loops are syntactic sugar for while loops, e.g.

    for(int i = 0; i < 10; i++);
Is just

    int i = 0;
    while(i < 10)
       i++;


I think there is also an extra scope embedded in the `for`. So enclosing it would be closer:

    {
        int i = 0;
        while(i < 10)
            i++;
    }


Honestly, I think that's cleaner.


Keep in mind they're not always equivalent. If you have 'continue' in the middle then the for loop will still perform the increment but the while loop won't.


This bites me a few times a year.


Here's a thought: some languages have a defer statement that runs when the function is exited. This allows for cleanup code to run even with early return. Perhaps a block-level defer could be useful:

    int i = 0;
    while (i < 10) {
        defer i++;
        // ...
        continue;
        // ...
    }


If someone wants to play with this idea, I made a pre-processor that translates that into standard C, inserting the deferred statements at each exit point (break, continue, goto, return, end of block):

https://news.ycombinator.com/item?id=28166125

https://sentido-labs.com/en/library/cedro/202106171400/#defe...


Hahah, what are the odds! Thanks for sharing, I will take a look.


IMHO the way to avoid it is to avoid nonlinear [1] control flow. If you can convince yourself that early returns, continue, etc. are harmful, you'll avoid the pitfalls that come with them. It's an unpopular position to take, though.

[1] I might be sloppy with the terminology here, but you catch my drift.


I do think early returns, continue, etc are a good source of clarity. Sometimes, code that avoids early returns seems convoluted and can use more variables, be more verbose and require more effort to be understood. In for loops, you need to understand that this complicated condition is there to replace a break. Not my cup of tea.


Yeah I get that, that's always the argument. I'm not an absolutist on this either (I do sprinkle continue/return into my code once in a while when it really seems worth it; even the same with the occasional goto), but in general I feel people are way more liberal with these than they should be (except goto, which people avoid at all costs no matter how complicated the code becomes, thus erring toward the other extreme). Never been able to convince many others though; the arguments never get anywhere when person A insists X is more readable and person B insists Y is more readable.


The use of "goto" is fine and more legible in some contexts, such as jumping to the "cleanup" part of a function.

You can avoid it, but it'll involve one of repetition, a wrapper function, or "formalizing" the cleanup so that it's your only cleanup call. Depending on the situation those are all going to be way more verbose.

Some advanced use cases include jumping between "case" labels, or jumping into the middle of a loop. Most of the time that's bad practice, but in the cases that benefit from them it's much better than the alternatives.


At the end, readability depends heavily on habits.

You can know that, but you feel it when you work on various codebases with different people / switch teams.

So, yes. Nobody is right or wrong in these topics. Moreover, there are situations where one is clearly better than the other.

If you need to (formally) verify your code, depending on the method, maybe early returns are to be avoided.


The problem is that the i++ goes at the end of the loop body. When the loop body is long, the increment can be many many lines away from the initialization and loop condition. The increment won't always be as obvious as "i++" and can involve function calls. With a well-written for loop, you can see all of that at a glance.


> With a well-written for loop, you can see all of that at a glance.

Yes, this is ultimately about the push for structured programming, which involves creating a pattern language of loops with "clean" semantics. Grouping the loop advancing statement with the other parts of a general loop construct helps wrt. recognition of known looping patterns.


The D compiler simply rewrites while loops into for loops, that way I only had to write one semantic routine for the for loop.


In this example maybe, but in practice the for loop looks cleaner (imo obviously.)


As this compiler is written in modern C, I guess the next logical step is to write a version that can compile itself?


This is a great study: the code isn't that large, and it is rather straightforward compared to looking at today's GCC. Thanks for putting this up!


If you want to study a tiny "not quite C" compiler, I recommend C4:

https://news.ycombinator.com/item?id=22353532




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: