Mapping High-Level Constructs to LLVM IR

DannyBee · on Nov 9, 2014

"does not differentiate between signed and unsigned integer"

This isn't really accurate. IT's accurate at some level, in that there are not two different types. But that's because it's not necessary, they have nsw/nuw (no signed wrap, no unsigned wrap) flags on the operations.

sklogic · on Nov 9, 2014

It is necessary - you cannot reconstruct your function prototypes from the IR declarations. That's why, for example, SPIR provides signedness information in the metadata. This ugly hack would not be necessary if IR had explicitly signed types.

DannyBee · on Nov 12, 2014

"It is necessary - you cannot reconstruct your function prototypes from the IR declarations. "

Why would you want to do this? You are trying to reconstruct high level, language specific info from low level, language independent info. Of course you can't do this.

Your comment comes out to "lowering loses information", which is, of course, true.

So let's try again: What semantic information do you believe is lost here. IE where does it result in an incorrect translation, or the inability to optimize something?

Because that's what "necessary" would come out to.

sklogic · on Nov 12, 2014

> Why would you want to do this?

Because SPIR requires that you can load and enqueue an OpenCL kernel from an IR, without having a source. And you need this type information in order to be able to format your arguments properly.

Please note that I'm not commenting here on the very choice of LLVM IR as a common medium made by SPIR committee.

> Your comment comes out to "lowering loses information", which is, of course, true.

Exactly. But the current common uses of LLVM IR do require some of the information which is lost, and, in case of signedness, it was not even necessary to lower it.

> IE where does it result in an incorrect translation, or the inability to optimize something?

You're limiting IR uses to translation and optimisations. Fine. I would have welcomed this way of thinking. But, unfortunately, this is not the case.

DannyBee · on Nov 12, 2014

" But the current common uses of LLVM IR"

Portability of a high level language is an explicit non-goal of LLVM :)

The fact that someone decided to do it just makes them silly :)

sklogic · on Nov 12, 2014

I know it's silly. I can go on forever on what I think about SPIR and RenderScript. But, unfortunately, now it's a fact, an objective reality we have to deal with.

stygianguest · on Nov 9, 2014

Nice but damn, I'm about to implement closures in llvm, but there's only a TODO at that point ;)

sklogic · on Nov 12, 2014

Closures are trivial: a structure with a function pointer as a first element (i.e., the structure alignment is function pointer-compatible) and the captured environment following it. Pointer to this structure is passed as the first argument.

The only potentially funny bit with the closures is construction of a set of potentially mutually recursive closures - in such case you have to defer filling in the corresponding environment fields until all the closure structures are allocated.

chrisseaton · on Nov 9, 2014

What language are you working on?

I guess for closures you simply copy all the locals that they capture into a heap allocated structure that also has the pointer to the closure code.

Someone · on Nov 9, 2014

I have never implemented this, so I'm sure this will be incomplete and possibly slightly inaccurate, but that is not quite true. Some of the issues: copying may not be appropriate if your data is mutable (multiple closures can share a value or it might be modified in the 'regular' code after the closure is created), or if the code does identity checks later on.

Also, for performance, it may be beneficial to skip the 'put a local on the stack' part and create a to-be-captured object directly on the heap.

I see somebody posted https://news.ycombinator.com/item?id=8580501, which points to the Wikipedia entry on 'spaghetti trees', the conceptual view on the needed data structures (which one may recognize from reading SICP, although I do not remember it using the term)

Many implementations, at runtime, will implement the 'main line' of the tree as a 'real stack', but that can be risky, as you will have to make sure that no closures survive the point where any locals they refer to get removed from the stack (what does C++ here? Declare it undefined behaviour or make it impossible?)

stygianguest · on Nov 10, 2014

It's an inhouse shader language that is compiled to different targets, one of which is llvm another is glsl. In our case you are probably right, but it would have been nice to see a simple example. Of course there's many implementations in other open source languages, but it's much more work to analyse such large code bases.

pjmlp · on Nov 9, 2014

You can always check how Haskell, Rust and Clojure-LLVM do it, just to cite three examples.

TheRubyist · on Nov 9, 2014

Can't wait until Apple will disclose more information about Swift IR that compiles to LLVM IR. Too bad they canceled this year LLVM Devmeeting keynote.

exDM69 · on Nov 9, 2014

I'm guessing that it's only conventions to work around the hardware specific parts of LLVM IR. Stuff like pointer and integer size, etc.

If you want to look at something similar, OpenCL has an LLVM IR -based standardized binary IR.

DannyBee · on Nov 9, 2014

Actually, this is very wrong :)

SIL has a very large number of differences from LLVM IR. It's essentially built as an IR that the can do static analysis and high level optimization on.

This means it has a number of higher level constructs that LLVM doesn't, in order to be able to achieve the semantics they want for static analysis, and in order to be able to do things like optimize dispatch.

pjmlp · on Nov 9, 2014

Just check Haskell LLVM backend instead.