Bitey: Import LLVM bitcode directly into Python

zokier · on Aug 10, 2012

I very much like the concept. But I dislike the "magic" feel of overloading "import". Especially as it creates namespace collisions, eg when you have python module and object file with same name. Imho something like "bitey.import_obj(name)" would have been nice and more clear.

I wonder how complicated it would be to parse header files to populate field names of structs automatically? Maintaining separate .pre.py and .h files seems like recipe for trouble.

wzdd · on Aug 10, 2012

Neither do I, but to be fair to the author this particular magic is encouraged in Python (see http://www.python.org/dev/peps/pep-0302).

To parse headers for field names would require a complete C preprocessing and parser. That wouldn't be a problem for this author (who wrote a very popular parser generator for Python), but it still wouldn't be perfect until it completely replicated the system compiler's behaviour with respect to system headers (consider conditional compilation). It is particularly annoying if the host and target systems are different, i.e. in cross compilation. I've tried this exact thing (header parsing to get type information) and it is quite a pain to get it right.

zokier · on Aug 10, 2012

Maybe you could use LLVM/Clang to parse the headers. That way at least the parsing would be consistent with the compilation. I don't know how difficult it would be to access such type information, but I thought the whole point of LLVM/Clang was to be usable as a library.

viraptor · on Aug 10, 2012

The collision is already happening between .so and .py files, so it's not a new issue.

synparb · on Aug 10, 2012

I'll be interested to see how this compares to numba and how easy it is to fit it into the scientific python ecosystem. I'm also curious about performance versus numba and my current go to solution, cython.

bbayer · on Aug 10, 2012

What about performance? Is there any benchmark for a function compared with pure python and C implementation?

osener · on Aug 10, 2012

You can find one in the mandelbrot example.

https://github.com/dabeaz/bitey/tree/master/examples/mandel

zokier · on Aug 10, 2012

It would be nice if it was compared to pure c implementation too, to measure the overhead.

jrockway · on Aug 9, 2012

"I call the big one Bitey."

king_magic · on Aug 9, 2012

First thing that came to my mind :-)

madlag · on Aug 9, 2012

Very cool ! Can't wait to test it ! Why exactly no C++ support ?

koenigdavidmj · on Aug 9, 2012

It would be hard in the presence of parametric polymorphism (multiple functions having the same name but different numbers and types of parameters), as well as the difference between C++'s and Python's method resolution semantics (in Python, for instance, everything is virtual).

Also, templates.

Also, a bunch of other things.

emillon · on Aug 10, 2012

That's not parametric polymorphism, that's function overloading. The former makes it possible to use the same function implementation on different types, whereas the latter is about using the same name for different functions.

madlag · on Aug 9, 2012

Thanks for the reply. I just spent the last 3-4 days hacking around osgswig, having a good (and simple) binding solution for C++ / python is still an open problem ...

mamcx · on Aug 10, 2012

And obj-c? Also hard?

kennywinker · on Aug 10, 2012

You'd have to shoehorn the obj-c runtime in there somehow. Almost definitely not worth the effort. What would you want to run in this way?

eliben · on Aug 10, 2012

While the mapping between C and LLVM IR is rather straightforward, things are more complex with C++. C++ constructs get dismantled to be compiled to IR, and the binding generator would have to restore all the C++-ish information from metadata and type info, which isn't easy.

zoobert · on Aug 10, 2012

It seems cool but why would you use it instead of having a c lib (or c obj file) interfaced with ctypes or swig ... ? Maybe I miss what LLVM brings ? Thanks

wzdd · on Aug 10, 2012

I addition to cdavid's comment: LLVM IR is system independent. So if you had a deployment over heterogeneous machines you could write the extension code once and have it run everywhere without recompilation.

horsetail · on Aug 10, 2012

LLVM IR is not system independent.

http://llvm.org/docs/FAQ.html#can-i-compile-c-or-c-code-to-p...

wzdd · on Aug 10, 2012

Fair enough, that was too strong a claim. But it is more portable than an extension module compiled with the system compiler.

tungstentim · on Aug 10, 2012

It's not portable enough to let me take IR generated for my x86 laptop and run it on my arm board, in general, even though both platforms are ILP32, little-endian, and running the same OS.

wzdd · on Aug 11, 2012

Yes, but this is a problem with the source language and/or the host system libraries, not with LLVM IR itself. There is a broad domain of applications for which it is portable.

radarsat1 · on Aug 10, 2012

I was confused about this too, but I think the idea is that it's easier to infer type information from the bitcode than from raw object code.

jnazario · on Aug 10, 2012

more closely related to ctypes than swig.

swig requires glue code (.i files) to be written and generated, then compiled. ctypes can take a system native library (.so or .dll) and access it.

bitey, on the other hand, uses platform-neutral llvm bytecode. imagine ctypes but platform neutral. that's bitey.

pretty darned cool.

cdavid · on Aug 10, 2012

once you can import llvm code at runtime, you are pretty close to injecting llvm code at runtime: you write some LLVM IR in a string, and "llvm_eval" it.

tocomment · on Aug 10, 2012

So can you use this with C functions compiled normally with gcc? It seems like that would be even more useful.

winter_blue · on Aug 10, 2012

What difference would it make? To my understanding, Clang is fully compatible with gcc.

tocomment · on Aug 10, 2012

I guess I mean can you use this with C instead of LLVM?

darkarmani · on Aug 10, 2012

Your terms are off. C is compiled using LLVM (clang) into an intermediate form that is compatible across platforms (LLVM IR).

(AFAIK, please correct me if I'm wrong)

darkarmani · on Aug 10, 2012

No. You have to use llvm.

tvdw · on Aug 10, 2012

I could be completely wrong here, but doesn't this completely bypass the optimizer that makes C so fast?

kibwen · on Aug 10, 2012

LLVM performs its optimization passes on the LLVM bitcode itself (the "middle-end" of the compiler), before finally translating the optimized bitcode into machine-specific binary code.

Not an LLVM expert though, I could be glossing over a few details.

benbjohnson · on Aug 10, 2012

You're correct. LLVM includes a pass manager that performs extensive optimizations.