At first I was confused because the pitch seemed to be mainly about writing fast code with Lua. If that's what they're going for, then a comparison with LuaJIT is sorely missing.
But it appears that what they're actually pitching is a simple and flexible code generation environment. It's a way to generate statically-typed code at runtime that targets LLVM but looks nicer than this: http://llvm.org/releases/2.6/docs/tutorial/JITTutorial2.html (C++ code that conjures up LLVM SSA IR directly). You could almost think of this as a high-level API for LLVM code generation and execution that is exceptionally well-integrated into Lua.
For example, in their example where they create a Terra function from a BF program, the equivalent in plain Lua would be to compile the BF program into a Lua program (represented as a big string), load it into the interpreter, and then let LuaJIT JIT it. But with Terra, you can represent the code you're generating symbolically with the "quote" construct instead of having to compile it to a big string. Of course you could just writer a BF interpreter in Lua directly, but if you compile it instead you'll get better performance because you won't pay an interpreter overhead and the optimizer can analyze the program flow to look for optimization opportunities.
[EDIT: removed incorrect criticism about the BF codegen being incomplete]
It's an interesting approach and I look forward to learning more about it.
Author here. You're right that we designed Terra primarily to be an enviornment for generate low-level code.
In particular, we want to be able to easily design and prototype DSLs and auto-tuners for high-performance programming applications.
We explain this use-case in more detail in our upcoming PLDI paper (http://terralang.org/pldi071-devito.pdf).
Since we are primarily using it for dynamic code generation, I haven't done much benchmarking against LuaJIT directly. Instead, we have compared it C by implementing a few of the language benchmarks (nbody and fannkuchredux, performance is normally within 5% of C), and comparing it against ATLAS, which implements BLAS routines by autotuning x86 assembly. In the case of ATLAS, we're 20% slower, but we are comparing auto-tuned Terra and auto-tuned x86 assembly.
Small note, the BF description on the website does go on to implement the '[' and ']' operators below. I just left them out of the initial code so it was easier to grok what was going on. The full implementation is at (https://github.com/zdevito/terra/blob/master/tests/bf.t).
This is really great CS work. Props. The fact that your numerical example is DGEMM, AND that you're comparing against ATLAS and MKL is very compelling, especially since you're only showcasing the kernel itself!
I'm taking a different albeit related approach for dynamic runtime code gen, but either way this is rock solid work, though I'm pretty terrible at deciphering the lua + macro heavy code that is your code examples.
edit: I'm doing something more akin to the Accelerate haskell EDSL approach, with some changes
It's also a very rare research paper that actually uses blas dgemm as the benchmark, that isn't a paper by someone explicitly focused on writing blas. Usually they just use dot product or a local convolution kernel (whereas in some sense matrix mult is a global convolution).
Just what they've done is a pretty solid. That said, it's not really done as part of a framework for numerics, which just means its a great validation benchmark of their code Gen.
I saw LLVM IR being referenced, but I am not sure if you are referring to the LLVM bitcode. If you are, wouldn't it be possible to compile Terra to JavaScript by using Emscripten?
It is a little hard to tell what the point of Terra is from the website; you should check out the PLDI paper for a better sense for what is going on http://terralang.org/publications.html (in particular, the example apps are telling: "Our Terra-based auto-tuner for BLAS routines performs within 20% of ATLAS, and our DSL for stencil computations runs 2.3x faster than hand-written C.")
This appears to be the perfect language for embedded applications. The combination of lua, high performance, and small generated code footprint is exactly what embedded applications need. I'd recommend the authors head in this direction - maybe try create bindings for Android and you'd get immense traction with this.
Let me see if I understand well: can I use this such that "terra" code is equivalent to C code and Lua code is equivalent to an extremely powerful preprocessor?
Am I right to think that I can generate dynamic libraries (.so) that do not include any kind of interpreter with this?
If I can do this then this may be my dream static / system language...
One of our design goals was to make sure terra could execute independently of Lua. So everything that you describe is possible. For instance our simple hello world program (https://github.com/zdevito/terra/blob/master/tests/hello.t) compiles a standalone executable with the "terralib.saveobj" function. You can also write out object (.o) files that are ABI compatible with C. For instance, gemm.t (https://github.com/zdevito/terra/blob/master/tests/gemm.t) our matrix-matrix multiply autotuner writes out a .o file my_dgemm.o which we then call from a test harness in a separate C program (https://github.com/zdevito/terra/blob/master/tests/reference...). Once you have the .o files, you can use Lua to call the system linker to generate a dynamic library.
Yet, it also feels kinda pointless, most Lua use right now is in embedded interpreters in other software, and Terra would be hard to use, since most projects probably won't incorporate it at all.
When it is a new language, you fight for space into another languages turf, your "potential" is illimited, if your language is better, it will win.
This language is obviously made to use with Lua and C at the same time, kind of a bridge of sorts, and thus it has much more limited scope and utility, and many of the uses of Lua even if they might need Terra performance, they cannot shoehorn Terra on their interpreter.
For example for coders of Corona SDK, or WoW, or many other game engines and application SDKs out there that rely on Lua.
I'm reminded quite a bit of OMeta and the other VPRI work on DSLs, although this is more targeted towards a specific application(dynamically optimized DSLs) and uses a more familiar imperative environment, rather than being parser-focused.
This looks great. I want to write a DSL to generate low-level code, and it would have involved both C and Lua at some point, so I'll definitely give this a try.
Yes! One of the benefits of making sure that Terra code can execute independly of Lua is that you can use multi-threading libraries pretty much out-of-the box. For instance, we have an example that launches some threads using pthreads (https://github.com/zdevito/terra/blob/master/tests/pthreads....).
There are still some limitations. You'd still have to manage thread synchronization manually, and I think LuaJIT only allows one thread of Lua execution to run at a time, so if your threads call back into Lua they may serialize on that bottleneck.
To answer my own questions, yes it uses (relies on) LuaJIT and there seems to be no problem running multiple Lua states, each using Terra. In fact because of the independent nature of compiled Terra code, I would wager you can create the Lua states with Terra and threads from within Terra itself. LuaJIT actually can't do this currently, you need a little bit of C code, because the callback passed to pthread_create will be invoked in a new thread on the old state (which would be invalid in LuaJIT as you can't share a state across threads like that.) Anyway I'll try it out and submit it as a test case for Terra if it works.
LuaJIT definitely has sweet-spots where it just flies, but there are other cases where LuaJIT isn't faster than PUC Lua, e.g. where you're doing a lot of string manipulation and calling into non-Lua libraries. In "average" code, it varies a lot, but you often don't get the insane speedups that makes LuaJIT look so great on small benchmarks.
Given that LuaJIT has some other drawbacks (e.g. it has memory limitations that PUC Lua doesn't have, due to the details of LuaJIT's NaN-encoding), the usual lesson applies: YMMV, so benchmark... :]
Yes, that's true, except for the calling into non-Lua libraries, this is where LuaJIT + ffi really shines. It can actually optimize away boxing/unboxing and inline the native function call into the trace (not the body of the native function, but the call itself.) Surely you're referring to something else? The often repeated wisdom on the LuaJIT mailing list is only use the Lua C API for legacy code as it can't come close to the performance or ease of use of the ffi. If your experience is otherwise, maybe the JIT bailed on your test code? The ffi is very slow without the JIT.
String and memory limitations have yet to bother me at all because anywhere speed matters you get order of magnitude improvements by managing the strings/memory yourself via the ffi. With Terra, it's clear that's the approach being advocated as well. I agree it really bolluxes small benchmarks, especially if the code is written for Lua and not done the "LuaJIT way" with the ffi. Outside of embedded or other exotic environments I think one would be hard-pressed to come up with a real-world workload where Lua PUC outperforms LuaJIT and there's no easy way to turn the tables. There are just far more options for optimization with LuaJIT and more ability to get closer to the metal than you have with Lua PUC.
None of that is to take away from what the Lua PUC guys have accomplished. Like any craftsmen who enjoys his work, I just like to use the best tools. That's LuaJIT in my opinion, and now Terra too.
because a terra file is a lua file, so lua functions are allowed / encouraged.
a terra function denotes a region of code with (slightly) different rules (arrays indexed at 0, use of C apis, etc). Making all of lua low level is interesting, but that's not what this project was going for.
Something like "tfunction" may have been better - it's obvious that its still a function, and the leading t let's you easily guess / remember the connection to Terra.
I think it looks very clean. I've always liked Lua's use of full keywords (e.g. function and end) and if the function didn't take any arguments then how would you know it's a terra function?
Is this aiming to be a semi-replacement for C then? I guess I'm a bit confused as to where it fits in ... Lua is already tiny and performant. We're considering it for some embedded projects soon as a extensibility hook.
It's aimed at people who would like to design a very performant DSL. Roughly speaking, your Lua code is the compiler, your Terra code is the runtime. Because the Lua code can metaprogram the Terra code, it's possible to perform dynamic tuning of Terra even while the program is running.
Code generators can, for some problems, write code that's faster than hand-written code simply because the sheer complexity of the result is pretty much impossible to handle "in yer brain".
This kind of highly problem-specific optimal code-building is out of scope for any kind of JIT, including luaJIT.
But it appears that what they're actually pitching is a simple and flexible code generation environment. It's a way to generate statically-typed code at runtime that targets LLVM but looks nicer than this: http://llvm.org/releases/2.6/docs/tutorial/JITTutorial2.html (C++ code that conjures up LLVM SSA IR directly). You could almost think of this as a high-level API for LLVM code generation and execution that is exceptionally well-integrated into Lua.
For example, in their example where they create a Terra function from a BF program, the equivalent in plain Lua would be to compile the BF program into a Lua program (represented as a big string), load it into the interpreter, and then let LuaJIT JIT it. But with Terra, you can represent the code you're generating symbolically with the "quote" construct instead of having to compile it to a big string. Of course you could just writer a BF interpreter in Lua directly, but if you compile it instead you'll get better performance because you won't pay an interpreter overhead and the optimizer can analyze the program flow to look for optimization opportunities.
[EDIT: removed incorrect criticism about the BF codegen being incomplete]
It's an interesting approach and I look forward to learning more about it.