Hacker News new | past | comments | ask | show | jobs | submit login
My own C compiler on my own compiler infrastructure (github.com/maekawatoshiki)
153 points by uint256_t on Oct 23, 2020 | hide | past | favorite | 19 comments



It says the compiler is "Powered by Cilk". But - what does that mean?

https://en.wikipedia.org/wiki/Cilk

isn't Cilk a C-like language with additions for parallel execution? How is that a compiler infrastructure?


See the parent directory of the repo linked in the OP: https://github.com/maekawatoshiki/cilk


Ah, interesting, thanks.

Well, the example shows convenient syntax... but - it also says it's a "toy" project.


I agree that Cilk is not a great name for compiler stuff as there is already a Cilk extension to C.

That said, it’s the author’s prerogative.


cilk in dependencies, should search "rust cilk"

https://github.com/maekawatoshiki/cilk


To be precise, this is a C compiler written in Rust --- I thought it would be another self-compiler, of which several have already appeared on HN.

In other words, the author is now encouraged to write a Rust compiler in C.


If the author wrote a C emitter for their compiler backend maybe they could just compile their existing compiler into C.


I'm pretty excited to see the infrastructure (i.e. cilk). But I'm wondering if it "fixes" some of the flaw in LLVM IR? For example, (not really a flaw but I rather prefer) using basic block parameters instead of PHI nodes.


There's no theoretical difference between phis and basic block parameters.

When you do SSA conversion, you need a notion that allows you to bind a value to a variable along an edge in the control-flow graph. You can choose to actually have some edge data that indicates these bindings, but this tends to be a poor engineering solution. If it has to go into a basic block, you need to either bind it to the source basic block or the target basic block. Bind it to the target, and you get phi nodes. Bind it to the source, and you get basic block parameters.

The choice of which to use is an engineering tradeoff. Phis have the issue that they are pseudo-instructions existing as the leading instructions in a basic block, and you may well need to special-case them in transformation or analysis passes. Using basic block arguments also provides a more direct analogy to interprocedural optimization, and you could potentially use the exact same code with a touch of specialization to handle that domain.

Since the two forms are moving the edge instructions to different places, they make different questions easier. With a phi node, it's quick to see if all predecessors are providing the same value. With a basic block argument, it's quick to see which variables are live out (assuming you're restricted to using only basic block-local variables [1]). In general, if you need to query all-predecessor analyses, it's easier with phis; if you need all-successor analyses, it's easier with basic block arguments.

[1] The choice of where variables can be used is also orthogonal to phis versus basic block arguments. Loop-closed SSA already exists, which prevents you from using a variable outside of its loop.


> There's no theoretical difference between phis and basic block parameters.

I question this claim. Don't basic block parameters allow you to do interesting stuff also at split points, not only at join points?

For example:

    if (x == 0) {
        y = 2 * x;
    } else {
        y = 3 * x;
    }
In SSA form this is:

        if (x == 0) jump B1 else jump B2
    B1:
        y1 = 2 * x
        jump join
    B2:
        y2 = 3 * x
        jump join
    join:
        y = phi(y1, y2)
You need conditional constant propagation to realize that y1 must be 0.

But with basic block arguments you could do this (improvising notation on the spot):

        if (x == 0) jump B1(0) else jump B2(x)
    B1(arg1):
        y1 = 2 * arg1;
        jump join(y1)
    B2(arg2):
        y2 = 3 * arg2;
        jump join(y2)
    join(y):
        ...
Basically, as soon as information becomes available (x is 0 along some path), that is explicit in the IR. Realizing that y1 is 0 only takes simple, non-conditional sparse constant propagation. You might not call that a "theoretical" difference (what theory are we talking about?), but it does give you a simple way to access information that is not explicit in SSA form.

In effect this latter representation is something like Static Single Information (SSI) form: https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.9...


As I explained in my footnote, you can do the same thing with SSA form:

    if (x == 0) jump B1 else jump B2
  B1:
    x1 = phi(x)
    y1 = 2 * x1
    jump join
  B2:
    x2 = phi(x)
    y2 = 3 * x2
    jump join
  join:
    y = phi(y1, y2)
And then you can do the same propagation of the x == 0 to the phi in B1 pretty easily. It is somewhat harder than the basic-block argument case, but that's because basic block arguments rely on the data binding being put in the head of the edge and we're pushing data from the head into the edge. The next step of checking if all incoming edges are identically 0 would be easier in the phi-form, since the data binding is in the tail node, and we're pulling data from the edge into the tail.


Huh! I stopped learnin' compilery stuff after SSA and was unaware of the approach. If anyone else is wondering, here's a good Reddit thread on bblock parameters:

https://www.reddit.com/r/ProgrammingLanguages/comments/9z8qu...


It's relatively new in the mainstream of compiler's, i.e. MSVC only got SSA in its backend 5 years ago or so.

I think Appel wrote a paper comparing SSA to functional programming way back in the 90s.


It's also included in his books ("Modern compiler programming in <insert language name>", where the language is only used in the first third of the book).


(I'm the owner of cilk.) I didn't know the basic block parameters. It seems better than using phi nodes. Thanks.


Yeah basic block parameters are especially useful when doing data flow analysis


This amuses me. Here, I'll explain the joke for you.

There's a certain perversity in writing a C compiler in Rust, the language designed to replace C. Yet, you wrote a compiler for another, smaller language, implying Rust wasn't enough. But, then, you could have written the compiler in C, too, which means C wasn't enough.

I'm just being a turd, I know he really did it for the glory of Satan. :^)


My compiler infrastructure (cilk) is renamed sericum :)


And the C compiler that uses it is now at: https://github.com/maekawatoshiki/sericum/tree/master/sericu...

(For historical reasons if anyone is looking at this thread in the future.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: