Terra – a low-level counterpart to Lua

haberman · on May 14, 2013

At first I was confused because the pitch seemed to be mainly about writing fast code with Lua. If that's what they're going for, then a comparison with LuaJIT is sorely missing.

But it appears that what they're actually pitching is a simple and flexible code generation environment. It's a way to generate statically-typed code at runtime that targets LLVM but looks nicer than this: http://llvm.org/releases/2.6/docs/tutorial/JITTutorial2.html (C++ code that conjures up LLVM SSA IR directly). You could almost think of this as a high-level API for LLVM code generation and execution that is exceptionally well-integrated into Lua.

For example, in their example where they create a Terra function from a BF program, the equivalent in plain Lua would be to compile the BF program into a Lua program (represented as a big string), load it into the interpreter, and then let LuaJIT JIT it. But with Terra, you can represent the code you're generating symbolically with the "quote" construct instead of having to compile it to a big string. Of course you could just writer a BF interpreter in Lua directly, but if you compile it instead you'll get better performance because you won't pay an interpreter overhead and the optimizer can analyze the program flow to look for optimization opportunities.

[EDIT: removed incorrect criticism about the BF codegen being incomplete]

It's an interesting approach and I look forward to learning more about it.

zdevito · on May 14, 2013

Author here. You're right that we designed Terra primarily to be an enviornment for generate low-level code. In particular, we want to be able to easily design and prototype DSLs and auto-tuners for high-performance programming applications. We explain this use-case in more detail in our upcoming PLDI paper (http://terralang.org/pldi071-devito.pdf).

Since we are primarily using it for dynamic code generation, I haven't done much benchmarking against LuaJIT directly. Instead, we have compared it C by implementing a few of the language benchmarks (nbody and fannkuchredux, performance is normally within 5% of C), and comparing it against ATLAS, which implements BLAS routines by autotuning x86 assembly. In the case of ATLAS, we're 20% slower, but we are comparing auto-tuned Terra and auto-tuned x86 assembly.

Small note, the BF description on the website does go on to implement the '[' and ']' operators below. I just left them out of the initial code so it was easier to grok what was going on. The full implementation is at (https://github.com/zdevito/terra/blob/master/tests/bf.t).

carterschonwald · on May 14, 2013

This is really great CS work. Props. The fact that your numerical example is DGEMM, AND that you're comparing against ATLAS and MKL is very compelling, especially since you're only showcasing the kernel itself!

I'm taking a different albeit related approach for dynamic runtime code gen, but either way this is rock solid work, though I'm pretty terrible at deciphering the lua + macro heavy code that is your code examples.

edit: I'm doing something more akin to the Accelerate haskell EDSL approach, with some changes

carterschonwald · on May 14, 2013

It's also a very rare research paper that actually uses blas dgemm as the benchmark, that isn't a paper by someone explicitly focused on writing blas. Usually they just use dot product or a local convolution kernel (whereas in some sense matrix mult is a global convolution).

Just what they've done is a pretty solid. That said, it's not really done as part of a framework for numerics, which just means its a great validation benchmark of their code Gen.

lukego · on May 14, 2013

Cool stuff :-)

Here's one perhaps relevant paper about LuaJIT for dynamic code generation in QEMU-esque instruction set simulation: http://ieee-hpec.org/2012/index_htm_files/Steele.pdf

haberman · on May 14, 2013

Thanks for the info! Sorry for missing the implemented '[' and ']' -- you might want to replace the "NYI" comment with "Implemented below". :)

snaky · on May 14, 2013

>I haven't done much benchmarking against LuaJIT directly

It would be better to comparing it to LuaJIT with ffi

ezdiy · on May 14, 2013

Than you so much for this. You actually implemented Haskell/Caml for us, mere mortals, and as close to bare metal as possible. Bravo!

ysangkok · on May 16, 2013

I saw LLVM IR being referenced, but I am not sure if you are referring to the LLVM bitcode. If you are, wouldn't it be possible to compile Terra to JavaScript by using Emscripten?

ezyang · on May 14, 2013

It is a little hard to tell what the point of Terra is from the website; you should check out the PLDI paper for a better sense for what is going on http://terralang.org/publications.html (in particular, the example apps are telling: "Our Terra-based auto-tuner for BLAS routines performs within 20% of ATLAS, and our DSL for stencil computations runs 2.3x faster than hand-written C.")

RyanZAG · on May 14, 2013

This appears to be the perfect language for embedded applications. The combination of lua, high performance, and small generated code footprint is exactly what embedded applications need. I'd recommend the authors head in this direction - maybe try create bindings for Android and you'd get immense traction with this.

catwell · on May 14, 2013

Let me see if I understand well: can I use this such that "terra" code is equivalent to C code and Lua code is equivalent to an extremely powerful preprocessor?

Am I right to think that I can generate dynamic libraries (.so) that do not include any kind of interpreter with this?

If I can do this then this may be my dream static / system language...

zdevito · on May 14, 2013

One of our design goals was to make sure terra could execute independently of Lua. So everything that you describe is possible. For instance our simple hello world program (https://github.com/zdevito/terra/blob/master/tests/hello.t) compiles a standalone executable with the "terralib.saveobj" function. You can also write out object (.o) files that are ABI compatible with C. For instance, gemm.t (https://github.com/zdevito/terra/blob/master/tests/gemm.t) our matrix-matrix multiply autotuner writes out a .o file my_dgemm.o which we then call from a test harness in a separate C program (https://github.com/zdevito/terra/blob/master/tests/reference...). Once you have the .o files, you can use Lua to call the system linker to generate a dynamic library.

shaunxcode · on May 14, 2013

This fits perfectly with my desire to work on a lower level lisp with out having to work entirely in C. Great work!

speeder · on May 14, 2013

As a speaker about Lua, this is awesome news.

Yet, it also feels kinda pointless, most Lua use right now is in embedded interpreters in other software, and Terra would be hard to use, since most projects probably won't incorporate it at all.

marshray · on May 14, 2013

Isn't that true of every new language project though?

speeder · on May 14, 2013

More or less.

When it is a new language, you fight for space into another languages turf, your "potential" is illimited, if your language is better, it will win.

This language is obviously made to use with Lua and C at the same time, kind of a bridge of sorts, and thus it has much more limited scope and utility, and many of the uses of Lua even if they might need Terra performance, they cannot shoehorn Terra on their interpreter.

For example for coders of Corona SDK, or WoW, or many other game engines and application SDKs out there that rely on Lua.

snaky · on May 14, 2013

That's changing rapidly, especially after LuaJIT and ngx_lua gained some publcity.

chipsy · on May 14, 2013

I'm reminded quite a bit of OMeta and the other VPRI work on DSLs, although this is more targeted towards a specific application(dynamically optimized DSLs) and uses a more familiar imperative environment, rather than being parser-focused.

od2m · on May 14, 2013

That is badass. Will be trying this tomorrow.

stavros · on May 14, 2013

Hah, props on the pun of "Terra" being the low-level counterpart of "Lua". Good naming, there.

pbo · on May 14, 2013

This looks great. I want to write a DSL to generate low-level code, and it would have involved both C and Lua at some point, so I'll definitely give this a try.

tantalor · on May 14, 2013

Terra code can execute independently of Lua’s runtime

How about concurrently? That might be nice to have for asynchronous applications.

zdevito · on May 14, 2013

Yes! One of the benefits of making sure that Terra code can execute independly of Lua is that you can use multi-threading libraries pretty much out-of-the box. For instance, we have an example that launches some threads using pthreads (https://github.com/zdevito/terra/blob/master/tests/pthreads....).

There are still some limitations. You'd still have to manage thread synchronization manually, and I think LuaJIT only allows one thread of Lua execution to run at a time, so if your threads call back into Lua they may serialize on that bottleneck.

slashdev · on May 14, 2013

So it's using LuaJIT and not PUC Lua? (A good thing to be sure, as LuaJIT is much faster, even without the JIT.)

Is there a way to run Terra with a separate Lua state per thread? So as to not have the problem with serializing when calling Lua from Terra?

slashdev · on May 14, 2013

To answer my own questions, yes it uses (relies on) LuaJIT and there seems to be no problem running multiple Lua states, each using Terra. In fact because of the independent nature of compiled Terra code, I would wager you can create the Lua states with Terra and threads from within Terra itself. LuaJIT actually can't do this currently, you need a little bit of C code, because the callback passed to pthread_create will be invoked in a new thread on the old state (which would be invalid in LuaJIT as you can't share a state across threads like that.) Anyway I'll try it out and submit it as a test case for Terra if it works.

snogglethorpe · on May 14, 2013

LuaJIT definitely has sweet-spots where it just flies, but there are other cases where LuaJIT isn't faster than PUC Lua, e.g. where you're doing a lot of string manipulation and calling into non-Lua libraries. In "average" code, it varies a lot, but you often don't get the insane speedups that makes LuaJIT look so great on small benchmarks.

Given that LuaJIT has some other drawbacks (e.g. it has memory limitations that PUC Lua doesn't have, due to the details of LuaJIT's NaN-encoding), the usual lesson applies: YMMV, so benchmark... :]

slashdev · on May 14, 2013

Yes, that's true, except for the calling into non-Lua libraries, this is where LuaJIT + ffi really shines. It can actually optimize away boxing/unboxing and inline the native function call into the trace (not the body of the native function, but the call itself.) Surely you're referring to something else? The often repeated wisdom on the LuaJIT mailing list is only use the Lua C API for legacy code as it can't come close to the performance or ease of use of the ffi. If your experience is otherwise, maybe the JIT bailed on your test code? The ffi is very slow without the JIT.

String and memory limitations have yet to bother me at all because anywhere speed matters you get order of magnitude improvements by managing the strings/memory yourself via the ffi. With Terra, it's clear that's the approach being advocated as well. I agree it really bolluxes small benchmarks, especially if the code is written for Lua and not done the "LuaJIT way" with the ffi. Outside of embedded or other exotic environments I think one would be hard-pressed to come up with a real-world workload where Lua PUC outperforms LuaJIT and there's no easy way to turn the tables. There are just far more options for optimization with LuaJIT and more ability to get closer to the metal than you have with Lua PUC.

None of that is to take away from what the Lua PUC guys have accomplished. Like any craftsmen who enjoys his work, I just like to use the best tools. That's LuaJIT in my opinion, and now Terra too.

gabipurcaru · on May 14, 2013

> The keyword 'terra' introduces a new Terra function.

Why would anyone do this? What's wrong with func, function, def, etc. ?

dlss · on May 14, 2013

because a terra file is a lua file, so lua functions are allowed / encouraged.

a terra function denotes a region of code with (slightly) different rules (arrays indexed at 0, use of C apis, etc). Making all of lua low level is interesting, but that's not what this project was going for.

kevingadd · on May 14, 2013

Lua already has a keyword for declaring functions

demallien · on May 14, 2013

Something like "tfunction" may have been better - it's obvious that its still a function, and the leading t let's you easily guess / remember the connection to Terra.

That minor gripe aside, this looks awesome.

fulafel · on May 14, 2013

How's the safety? Is there, for example, bounds checking?

draegtun · on May 14, 2013

In a similar vein is Red/System which is a low-level system dialect (DSL) of the Red (Rebol inspired) programming language.

ref: http://static.red-lang.org/red-system-specs.html

qznc · on May 14, 2013

Does it really need that "terra" keyword? You could use the type annotation to differentiate:

    -- lua function
    function foo(a,b)
      ...
    end
    -- terra function
    function foo(a:int, b:int)
      ...
    end

daviddoran · on May 14, 2013

I think it looks very clean. I've always liked Lua's use of full keywords (e.g. function and end) and if the function didn't take any arguments then how would you know it's a terra function?

soofaloofa · on May 14, 2013

I don't understand. Where would you use this instead of Lua?

Jackim · on May 14, 2013

Lower level, so more control for advanced programs performance-wise.

voidfunc · on May 14, 2013

Is this aiming to be a semi-replacement for C then? I guess I'm a bit confused as to where it fits in ... Lua is already tiny and performant. We're considering it for some embedded projects soon as a extensibility hook.

chipsy · on May 14, 2013

It's aimed at people who would like to design a very performant DSL. Roughly speaking, your Lua code is the compiler, your Terra code is the runtime. Because the Lua code can metaprogram the Terra code, it's possible to perform dynamic tuning of Terra even while the program is running.

snaky · on May 14, 2013

And what it may give us at lower level, comparing to LuaJIT + ffi + ljsyscall?

rossjudson · on May 14, 2013

Code generators can, for some problems, write code that's faster than hand-written code simply because the sheer complexity of the result is pretty much impossible to handle "in yer brain".

This kind of highly problem-specific optimal code-building is out of scope for any kind of JIT, including luaJIT.

Have a look at Dan Amelang's jitblt:

http://www.vpri.org/pdf/tr2008002_jitblt.pdf

PySlice · on May 15, 2013

Well, I guess this is going to be a fairly down-to-earth language.

joamag · on May 14, 2013

Very interesting. Is this "new" language in direct competition with Go ?

snaky · on May 14, 2013

I'd say LuaJIT is in direct competition with Go.

rb2k_ · on May 14, 2013

> Terra is a simple, statically-typed, compiled language with manual memory management

How is is like lua then?