Hacker News new | past | comments | ask | show | jobs | submit login
Cling: an interactive C++ interpreter, built on top of Clang and LLVM (github.com/vgvassilev)
179 points by ColinWright on Sept 24, 2014 | hide | past | favorite | 41 comments



I think every language would be more productive if it were interpreted at development and compiled at production.


Fully agree.

Just as info for others.

This is the model followed by environments for Common Lisp, Eiffel, JVM, ML languages, .NET, Dylan, Erlang.

Best of both worlds, JIT/Interpreted for development and JIT/AOT for production.


That's exactly how CINT was used in ROOT, but now they're moving to Cling instead. Clang compilation is sufficiently fast for an interactive development.


After years of dealing with Graduate student missteps and ROOT->Java interoperability, to me, ROOT will always mean: blown stacks (i.e. double hist_data[1000000];), poor documentation on TTree/TBranch, NPEs, exceptions, the horrible bag of "features" TObject is, terrible debug statements, no logging, GUI hell, pseudo-.C-macro-compilation-that-only-works-with-trivial-cases, code translation, dlopen madness, code interoprability, Mac OSX support, 2+ hour compilations, TList hell because no STL (NO STL!?!?!), etc...

Years of physicists time has been wasted on CINT, and many more years on ROOT itself.

I'm sure Cling is much better, and it looks cool on it's own, but I know ROOT isn't. Luckily I only have to dig in the source code for ROOT once a year or so (the last time was figuring out they finally got around to adding LZMA compression to ROOT files when things started breaking at work).

I used to think PyRoot was the way forward. Eventually, ROOT will be burned at the stake and everyone will over with Julia.


Sadly, for most compiled languages the semantics will be completely different when being interpreted.


Why would the semantics change if you use an interpreter rather than a compiler?


Link to previous HN submission 2 years ago (it has moved on since then):

https://news.ycombinator.com/item?id=4373334


As early as 1989, there was an interactive C interpreter/REPL/debugger called 'Saber-C':

http://webcache.googleusercontent.com/search?q=cache:xZ37OrB...


The first C "interpreters" I know of came out for Lisp machines in the mid 1980s: Symbolics' C compiler (http://www.bitsavers.org/pdf/symbolics/software/genera_8/Use...) and Scott Burson's (hn user ScottBurson) ZetaC for TI Explorers/LMIs and Symbolics 3600s (now available under the public domain: http://www.bitsavers.org/bits/TI/Explorer/zeta-c/). Neither of them are interpreters, just "interactive" compilers like Lisp ones are.


One of my favorite early Rust features, in theory, was the interactive shell called "rusti". In practice it was awful to use because it was slow, crashed all the time, and required whole blocks of code be entered at once (declaring, defining, and using a variable for example). Some of that was attributable to beta status of the language but it seems like they ditched it in the end. Is there actual value to this sort of thing for languages that are classically compiled instead of interpreted? At least for Go, Rust, and C speed of compilation is less of an issue overall but compiling C++, even with clang, is so much of a bear that I can't see this catching on.


Slow compilation time is actually a great reason for having a REPL. e.g. Clojure and Julia are both REPL-based languages, though both secretly need to be compiled (and do so relatively slowly). In practice you don't notice the compile step at all, since after compiling and running once you can just compile 1-2 lines at a time in the REPL and it patches the running program for you – it's all instant.

A more interesting issue for REPLs is how expression- or statement-oriented the language is (c.f. Martin Odersky's concerns about the upcoming Java REPL). In Java/CXX it's much more likely for a function to want to change an external object, in which case you'll need to set up the environment, run the code, check the environment for the change, for every line you write. That's much more likely to be the nail in the coffin for this.


This might be interesting. I'm not actually interested in C++ (any more) but I'm still looking for the easiest possible way for me to get into LLVM. I think “learning LLVM” seems like the obvious next iteration of what used to be “learning assembler”, except this time around, it's actually portable. I love experimenting with languages and having an LLVM backend ready at my fingertips must feel like Programmer's Heaven...


I highly recommend the LLVM Kaleidoscope tutorial. [1] It takes you through all the steps required to implement a REPL for a simple single-type language, and is surprisingly concise given the end result.

Regarding the JIT aspect, if you are like me, you will be in awe when you encounter the single operation

    f_ptr = engine->getPointerToFunction(f_def)
It turns a function definition (in LLVM IR) into an actual function pointer that you can invoke just like any other C/C++ function.

[1] http://llvm.org/docs/tutorial/LangImpl1.html#tutorial-introd...


Perfect, this is exactly on point. Thanks!


How does something like this work in general? My naive idea about how to make such a thing essentially seems like a virtual machine, which is quite a bit of effort.


Not so much a VM as a jit compiler.

In principle, you can use any compiler of your choice to fake this: Just create a dynamic library for each line that has been entered into the REPL that you immediately load and call into.

Now, 'all' you need to do is cut out the middleman and directly output properly relocated code to memory. There are compilers that can do so. Clang/LLVM is one one of them. TinyCC is another.


If so, isn't that rather similar to how Visual C++ allows you to modify code in flight while debugging by rolling back to the beginning of the function call?


Exist a distilled example of this? I'm totally lost in how make compiled/static code to work as dynamic.


The word "dynamic" in the grandparent post refers to dynamically relocatable object files, i.e. all the memory references in the binary are written as relative offsets so that the binary doesn't assume anything about what address it will have within the process when it gets loaded. Concretely, this means the difference between "JMP 0xDEADBEEF" versus "JMP (special_register)+0xBEEF" so that the .so or .dll file doesn't assume it'll get loaded at 0xDEAD0000.

For a less confusing explanation, see https://en.wikipedia.org/wiki/Position-independent_code

(Also: no, it's not really implemented by using one offset register for each .so file in the process.)


Interesting. How would a line compile on its own if it's, say, "n += 2;" without providing the n?


That is indeed an issue. One way to solve it is to make all declarations at top-level of the REPL into globals.

Another way would be to operate at block- instead of line-level and just not allow local variable to be shared between blocks. Ie

    int i = 5 ↵
    printf("%i", i) ↵
would fail, but

    { ↵
        int i = 5; ↵
        printf("%i", i); ↵
    } ↵
would not.


Cling relies on LLVM JIT, the same way as many other LLVM-based languages with REPLs. For each interactively added piece of code it creates an LLVM module, compiles it and jumps to its entry point.


I noticed it says that it "realizes the read-print-evaluate-loop" by which I assume you mean read-evaluate-print-loop.


Hey Guys! If you want, you can see Cling in action in a Terminal.com VM. I've just created a snapshot with Cling up and running and some documentation also. https://www.terminal.com/tiny/o1MgFbPB2L


Another option is Ch from Soft Integration [1], which has been making commercial grade C/C++ interpreters for a long time. The standard package is free and it comes with an IDE.

[1] https://www.softintegration.com/


Last time I checked (around 2 years ago), it was very unstable and did not run on many systems.

Looking into http://root.cern.ch/drupal/content/requested-cling-features: There is still "bug 11297: STL: vector of pair works if compiled, but not if interpreted" Which is something so basic that it has to work.

Apparently the bug report was created in 2005, and it's currently closed with "Won't Fix".

I'd really like a C++ REPL, but I am not hopeful for Cling.


Look again - the site you've linked lists issues with CINT, not Cling.

That's the whole point: By using Clang instead of a custom toolchain, you get a production-grade, standards-compliant C++ environment.


I'm a little unclear, does this hook into your code, or is it just a REPL? If it doesn't hook, what would be the use case for this?


Well, it hooks into your code in the sense that you can expose classes and libraries to it.

One use is for C++ "scripts". Clang replaces the older "C interpreter CINT" in ROOT (which is mainly a data analysis framework for particle physics). CINT was horrible because it was "not quite C++", and encouraged bad behavior. You could swap . and -> for example; objects that were saved to a file you loaded appear automatically in scope (without prior assignment), templates were a bit iffy and so on. You could run many .cpp files as CINT scripts, but not the other way around. So current best practice is to avoid the"interpreter" and write proper programs that link to libROOT. Cling seems to be much more sane.

I don't know if you need something like that, but one thing Cling/CINT gives you is reflection/introspection, which is pretty neat. I think it's also somehow used to get generate bindings for other languages. Oh, and having a Repl is pretty nice. I use it e.g. for quick plots:

    $ root filename.root
    [0] tree->Draw("mass")
(where "tree" is a data structure automatically loaded from filename.root containing data points, and the command makes a histogram of "mass")


Great answer, thank you.


I can definitely see myself just using a simple REPL. I don't use C/C++ that often, so being able to start an interpreter and executing just of few line of code would be great. A use case would be checking the exception semantics of C++, or which element boost::bissect returns when you have several equal values.


I don't know what you mean by 'hook into', but I presume you can simply #include any existing source code.


I think that he's asking if you can do something like the 'debugger' statement in Chrome/FF when debugging JS.


Ah yes, that's a great idea!


If you're that much into hooking and stuff, LLDB is your tool. And it's also based on Clang, in pretty much the same way as Cling does.


Just wondering, does this work with Windows API apps using WTL headers? This could be pretty amazing if that's the case.


Depends, if WTL is plain C++ and compiles with Clang, then it should. If it uses MSVC extensions (like `guidof`) then it won't work. (I've never used WTL besides a little Hello World test.)


Actually, Clang supports most MSVC extensions. Here are links to two of Clang's test for __uuidof: http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenCX... http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenCX...


With a good GUI, and you could do some amazing live coding.


Sounds like a good tool for C++ for those times you have to use C++.


for those times you have to use C++

Which is everyday for some people. Moreover, some of us want to use C++ :]




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: