My own C compiler on my own compiler infrastructure

einpoklum · on Oct 23, 2020

It says the compiler is "Powered by Cilk". But - what does that mean?

https://en.wikipedia.org/wiki/Cilk

isn't Cilk a C-like language with additions for parallel execution? How is that a compiler infrastructure?

Retr0spectrum · on Oct 23, 2020

See the parent directory of the repo linked in the OP: https://github.com/maekawatoshiki/cilk

einpoklum · on Oct 23, 2020

Ah, interesting, thanks.

Well, the example shows convenient syntax... but - it also says it's a "toy" project.

alcidesfonseca · on Oct 23, 2020

I agree that Cilk is not a great name for compiler stuff as there is already a Cilk extension to C.

That said, it’s the author’s prerogative.

sergeykish · on Oct 23, 2020

cilk in dependencies, should search "rust cilk"

https://github.com/maekawatoshiki/cilk

userbinator · on Oct 24, 2020

To be precise, this is a C compiler written in Rust --- I thought it would be another self-compiler, of which several have already appeared on HN.

In other words, the author is now encouraged to write a Rust compiler in C.

infogulch · on Oct 28, 2020

If the author wrote a C emitter for their compiler backend maybe they could just compile their existing compiler into C.

mshockwave · on Oct 23, 2020

I'm pretty excited to see the infrastructure (i.e. cilk). But I'm wondering if it "fixes" some of the flaw in LLVM IR? For example, (not really a flaw but I rather prefer) using basic block parameters instead of PHI nodes.

jcranmer · on Oct 23, 2020

There's no theoretical difference between phis and basic block parameters.

When you do SSA conversion, you need a notion that allows you to bind a value to a variable along an edge in the control-flow graph. You can choose to actually have some edge data that indicates these bindings, but this tends to be a poor engineering solution. If it has to go into a basic block, you need to either bind it to the source basic block or the target basic block. Bind it to the target, and you get phi nodes. Bind it to the source, and you get basic block parameters.

The choice of which to use is an engineering tradeoff. Phis have the issue that they are pseudo-instructions existing as the leading instructions in a basic block, and you may well need to special-case them in transformation or analysis passes. Using basic block arguments also provides a more direct analogy to interprocedural optimization, and you could potentially use the exact same code with a touch of specialization to handle that domain.

Since the two forms are moving the edge instructions to different places, they make different questions easier. With a phi node, it's quick to see if all predecessors are providing the same value. With a basic block argument, it's quick to see which variables are live out (assuming you're restricted to using only basic block-local variables [1]). In general, if you need to query all-predecessor analyses, it's easier with phis; if you need all-successor analyses, it's easier with basic block arguments.

[1] The choice of where variables can be used is also orthogonal to phis versus basic block arguments. Loop-closed SSA already exists, which prevents you from using a variable outside of its loop.

tom_mellior · on Oct 23, 2020

> There's no theoretical difference between phis and basic block parameters.

I question this claim. Don't basic block parameters allow you to do interesting stuff also at split points, not only at join points?

For example:

    if (x == 0) {
        y = 2 * x;
    } else {
        y = 3 * x;
    }

In SSA form this is:

        if (x == 0) jump B1 else jump B2
    B1:
        y1 = 2 * x
        jump join
    B2:
        y2 = 3 * x
        jump join
    join:
        y = phi(y1, y2)

You need conditional constant propagation to realize that y1 must be 0.

But with basic block arguments you could do this (improvising notation on the spot):

        if (x == 0) jump B1(0) else jump B2(x)
    B1(arg1):
        y1 = 2 * arg1;
        jump join(y1)
    B2(arg2):
        y2 = 3 * arg2;
        jump join(y2)
    join(y):
        ...

Basically, as soon as information becomes available (x is 0 along some path), that is explicit in the IR. Realizing that y1 is 0 only takes simple, non-conditional sparse constant propagation. You might not call that a "theoretical" difference (what theory are we talking about?), but it does give you a simple way to access information that is not explicit in SSA form.

In effect this latter representation is something like Static Single Information (SSI) form: https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.9...

jcranmer · on Oct 23, 2020

As I explained in my footnote, you can do the same thing with SSA form:

    if (x == 0) jump B1 else jump B2
  B1:
    x1 = phi(x)
    y1 = 2 * x1
    jump join
  B2:
    x2 = phi(x)
    y2 = 3 * x2
    jump join
  join:
    y = phi(y1, y2)

And then you can do the same propagation of the x == 0 to the phi in B1 pretty easily. It is somewhat harder than the basic-block argument case, but that's because basic block arguments rely on the data binding being put in the head of the edge and we're pushing data from the head into the edge. The next step of checking if all incoming edges are identically 0 would be easier in the phi-form, since the data binding is in the tail node, and we're pulling data from the edge into the tail.

tptacek · on Oct 23, 2020

Huh! I stopped learnin' compilery stuff after SSA and was unaware of the approach. If anyone else is wondering, here's a good Reddit thread on bblock parameters:

https://www.reddit.com/r/ProgrammingLanguages/comments/9z8qu...

mhh__ · on Oct 23, 2020

It's relatively new in the mainstream of compiler's, i.e. MSVC only got SSA in its backend 5 years ago or so.

I think Appel wrote a paper comparing SSA to functional programming way back in the 90s.

bonzini · on Oct 23, 2020

It's also included in his books ("Modern compiler programming in <insert language name>", where the language is only used in the first third of the book).

uint256_t · on Oct 23, 2020

(I'm the owner of cilk.) I didn't know the basic block parameters. It seems better than using phi nodes. Thanks.

mshockwave · on Oct 23, 2020

Yeah basic block parameters are especially useful when doing data flow analysis

Subsentient · on Oct 24, 2020

This amuses me. Here, I'll explain the joke for you.

There's a certain perversity in writing a C compiler in Rust, the language designed to replace C. Yet, you wrote a compiler for another, smaller language, implying Rust wasn't enough. But, then, you could have written the compiler in C, too, which means C wasn't enough.

I'm just being a turd, I know he really did it for the glory of Satan. :^)

uint256_t · on Oct 25, 2020

My compiler infrastructure (cilk) is renamed sericum :)

scott_s · on Oct 26, 2020

And the C compiler that uses it is now at: https://github.com/maekawatoshiki/sericum/tree/master/sericu...

(For historical reasons if anyone is looking at this thread in the future.)