Hacker News new | past | comments | ask | show | jobs | submit login
Using C++ as a scripting language, part 8 (fwsgonzo.medium.com)
134 points by fwsgonzo on July 29, 2023 | hide | past | favorite | 46 comments



Worth mentioning that Quake 3 used C as a scripting language back in 1999. The engine used a modified LCC compiler [1] to generate byte code for execution in its own virtual machine. The same could be achieved today by embedding a WASM VM and compiling your C/Zig/Rust whatever to web assembly.

As an aside, I find the idea of writing game code in unmanaged C++ perplexing. Even the original Quake, which targeted DOS-era hardware, relied on a scripting language running in a plain vanilla interpreter (no JIT here). In the [current year] I would think advances in computing power, parallelism, and JIT technology would be more than adequate to accommodate an interpreted scripting language.

[1] https://en.wikipedia.org/wiki/LCC_(compiler)#Projects_incorp...


Using an interpreter on old hardware was often to save memory

https://lwn.net/Articles/412024


"advances in computing power, parallelism, and JIT technology would be more than adequate to accommodate an interpreted scripting language."

While this is true, game consoles + iOS/iPadOS prohibit JIT and in some cases prohibit script interpreters, so people resort to stuff like 'we wrote a compiler for C# that turns it all into C++ source files' in order to pass review and collect Fortnite Money*.

* I'm using Fortnite Money here to represent 'the kind of money you can make if you're willing to sell lootboxes to teens on the App Store'


Add Vlang to that list, as another that can compile to web assembly. In addition to having a C2V translator (and some other languages have similar). A number of safer and easier/convenient options are out there for those that care to look.


>As an aside, I find the idea of writing game code in unmanaged C++ perplexing.

https://github.com/id-Software/DOOM-3


On the topic of using C++ for scripting, and related to the discussion of CERN's ROOT/Cling, I am developing a Clojure dialect on C++/LLVM called jank: https://jank-lang.org/

jank is a true Clojure, meaning you get interactive, REPL-based development (thanks to Cling) and a whole stdlib of persistent, immutable data structures (thanks to immer) and functions to transform them. But it's also C++, so you can write inline C++ within your jank source, and interpolate jank expressions within that. You can link with existing native code using LLVM and you can embed jank into your existing native projects to use for scripting.

jank is pre-alpha, right now, and I've only been showing it to Clojure devs so far, but there's a huge audience of C++ devs which may be interested in introducing Clojure to their native code.


Count me among them. I write C++ all day and I'd love to introduce Lisp/Clojure to orchestrate my more performant code. This is a very interesting project!


I don't know too much about either Jank or Ferret, but would you care to compare the two?


Happy to. Ferret and jank have similar goals, but some key differences:

1. Ferret targets embedded platforms primarily; jank is more general and less constrained

2. Ferret doesn't have a JIT-based runtime, meaning no nREPL while you build your programs; jank provides the whole Clojure interactive programming experience, meaning you can start with a REPL and an empty file and end up with a whole server/GUI application/game without stopping to recompile/run again

3. Ferret is written in Clojure and requires the JVM to run, whereas jank's host is native, meaning its runtime and compiler and both native (C++)

My overall sentiment for Ferret is that it's great and I'm excited that it exists. I have my eye on it so that we might be able to learn from some of the decisions made over there. In particular, its interop and memory management systems are of interest.

Worth noting, when you look at all of the existing native Clojures (Ferret, Carp, jank, etc), only one of them aims to be Clojure source compatible while providing a true REPL-driven experience: jank.


Very nice. Thank you.


If you want to see real C++ scripting: the (slightly insane) people of CERN use a 99% ANSI compliant, self-written C++ interpreter (!) for online data analytics.

https://root.cern/manual/first_steps_with_root/


I recently plugged my extremely, abusively templated C++ library into Cling (the C++ interpreter mentioned) and everything worked like a charm. I can even use it in Jupyter notebooks. The speed feels more on the order of Python, though. But if the heavy lifting is done by precompiled libraries, I can see Cling having a pretty nice workflow for experimenting with and interfacing C++ code


If I’m not mistaken, Cling is an LLVM-based JIT compiler, so it really should be performing close to the same level as an ahead-of-time Clang-compiled binary executable. Perhaps what you are noticing is JIT latency (from compiling on the fly.)


Maybe it is because of Debug vs. Release mode (and the optimization level coming with it) but I see a speed difference on the order of 10-100x vs. Release mode GCC


For heavily template code, a 10x-100x speed difference between -O0 and -O3 is not surprising in my experience.

The cost of compiling at -O3 is going to affect interactive use of course though.


"The speed feels more on the order of Python, though."

What part of "interpreted" did you not understand?


„Cling is an interactive C++ interpreter, built on the top of LLVM and Clang libraries.“

I don‘t understand your comment, do you care to enlighten me?


Interpreters are slower than compiled code.

Python is an interpreted language.

Yes, dynamic recompiling, JIT-ing, etc etc.

But you used a interpreter (for C++) with some expectation it would be faster than a language that is itself interpreted? I mean, I guess you could assume that the stronger typing of C++ would provide an advantage over Python...

But if you hear "interpreted", you should assume 10x slowdown at minimum, unless there has been human-centuries invested in performance improvement like the JVM.

RE: "nothing in the original message about expecting faster that interpreted code" He literally said "it's not any faster than Python" which is an interepreted language. He was expecting interpreted C++ to be substantially/perciptibly faster than Python ... an interpreted language. In case you didn't know a "scripting" language is 99% of the time interpreted.


With D language you can have your cake and eat it too.

This is because D has one of the fastest compilers. You can use rdmd as the REPL and also invoking the compiler for scripting [1].

[1]Why I use the D programming language for scripting:

https://opensource.com/article/21/1/d-scripting


There’s nothing in the original message that suggests that the commenter was expecting a faster execution in an interpreted environment.


As an FYI for folks, "Cling" is being integrated upstream into LLVM under the name "clang-repl":

https://clang.llvm.org/docs/ClangRepl.html


The CERN beer garden helps with the craziness, specially during Summertime. :)


Unless things have changed, root is quite horrible actually.


Root is absolutely, mind-blowingly, amazing. It gets a bad rap because it forces you to use primitives that were designed back in the early nineties. If you're "just" trying to analyze some data, your experience will indeed be "horrible" compared to what's offered by Python, R, Matlab, or Julia. But beyond that... Root adds fully working reflection to C++. Root gives you dynamic library loading and reloading - you can fix a bug or add a new feature, recompile parts of your program and keep working without restarting it. Root has a feature complete C++ interpreter, with scripting and a REPL loop. You can work with it completely interactively. After prototyping you can save your code as a script. After identifying performance critical parts of your code, you can compile them and get the full power of bare metal C++, without changing anything about the code. Yes this is technically possible with e.g. python + numba as well, but not as straight-forward. Root is fully interoperable with Python and R - you can mix scripts and REPLs between the languages and pass objects between them. Root can serialize any object, without requiring any custom code whatsoever (some serious dark magic needed for this). In fact you can pause your entire program and save it to disk or send it over the network to keep running somewhere else. Root has its own file format for efficiently storing massive amounts of data in arbitrarily complex structures. It can stream it over the network too, with probabilistic read ahead and caching for maximum efficiency. Root comes with libraries for physics/math/stats that rival those of the largest commercial and open source offerings. Each one of these is a massive technical achievement and Root has had most of them for decades now. Oh, and it has largely maintained backwards compatibility through all this time as well.

Of course, very few people outside of CERN need all of this. Even within CERN, many projects don't. But for those who do, there are very few - if any - alternatives.


That all sounds like a nightmare compared to python and matlab, which can do most of that


Python can do maybe 1 percent of all that. (Hell, Python has real trouble not shitting itself and dying after a "pip install", you can definitely forget about seamless native code compilation.)


But do you really need these features, already available in Matlab/Python/R/Julia/Lisp/? Or did the the C++ folks simply refuse to learn other languages?

From what I have seen in R and Python, the main reason for speed issues are incompetent programmers. Certainly, bad C++ code is much faster than bad Python code, but there is also the effort to build/maintain/document/teach Root to noobs.

Hot take: It's really about preferences, not features.


Let me paint you a picture: You have data coming off the detectors at a rate of a couple of hundreds of GB/s (after pre-filters implemented in FPGAs etc) that needs to be processed and filtered in real time with output written to disk and tape at about one GB/s. We're talking really CPU intensive processing here: Kalman filters, clustering algortihms, evaluating machine learning models. The facility is one of a kind and operating cost is in the billions per year so downtime is unthinkable, this stuff needs to work. Offline, you're running very, very, detailed (and CPU heavy) simulations. All in all, you have some hundreds of petabytes of data that are constantly being processed and reprocessed for hundreds of different purposes. These systems have many millions of lines of code between them, a lot of which needs to be shared between them. Offline analysis needs to re-run online algorithms and so on - you need a single stack for all systems. You have some hundreds of thousands of CPU cores to run all of this. Due to how academia works, beyond a couple of large core datacenters, resources are mostly spread out in hundreds of locations globally so that each participating university can have maintain a cluster on their premises for teaching/research/funding reasons. You need an efficient way to get the data that a program needs to where it is running, or preferably move the program to where the data is. This is not a tech company, there's no revenue so throwing money at the problem is not an option - it's all funded by tax payers so efficiency is paramount. What language do you reach for? Matlab? Lol. The closest analogy I can think of are some big trading systems and large scale ML inference and content serving at FAANG and the like. That's all usually java or C++.

Oh, one more thing: There's very few professional developers dedicated to this. A lot of it is built and maintained by grad students and researchers in-between writing papers. They're smart people, and they can code, but they have neither time nor interest in learning a new language or framework every other year. They move around. A lot. It wouldn't work to have different tech stacks for different projects - you need to pick one solution, not just for one area but for the entire field. So people can spend less time learning and more time doing. There's no one available to migrate legacy code because some new cool language appeared or yesterday's cool library isn't maintained anymore. These projects run for decades. Whatever tech you pick you must be certain that it will still be around and supported 10, 20, 30 years later. That the code still runs and the data that you paid billions for can still be read.


Thanks for the detailed answer, I really appreciate the insight. I work in research myself, so I'm familiar with the general constraints.

I was certainly unaware of the size of the data coming from the detectors. If speed is the argument that beats all others, I rest my case. From what I read on the root.cern website, root is a data analysis and simulation environment, so I was not aware of the aspect of prototyping for online use.

Because I spent a lot of time thinking how software development can work in an analysis heavy research environment, I still would like to comment on some of your points. To distribute binaries and source code, packages work very well for us. Especially if you want to reuse software components in unseen contexts, packages and a package registry makes the most sense.

The use case "re-run online algorithms in offline analysis" is a very familiar one. In my line of work, we do that daily: Switching between online and offline to test + deploy algorithms. Vastly smaller scale, of course. But to us, packages are the first part of the solution. All you do is change the data source. For offline, it's local data or a remote DB, for online, it's an interface such as a websocket.

The second part of the solution are unit- and integration tests. Other users will immediately see what you did (or didn't) test. Again, packages are the distribution system of choice. This has nothing to do with Matlab/Python/R/Julia. Rust has crates.io, JS has npmjs, even Java has something like Maven Central.

Regarding the funded-by-taxpayers argument: The issue I see here is that the cool ML, simulation, data analysis stuff which the CERN people do remains in the root ecosystem. If they used something like PyPI, I could use their stuff too. I have a lot of clustering problems, especially on time series. With a more or less proprietary system like root, I can't use any of CERN's implementations.

Regarding "researchers don't have time to learn new languages": If you look on github.com/root-project/root/issues and root-forum.cern.ch, there are suspiciously many questions regarding "how can I make use of root and Python libraries", and "X doesn't work in root, what do". Newbs have to learn root as well, and they seem to like using Python at least as an enhancement.


> C++ folks simply refuse to learn other languages

root is many many decades old. And I understand that the scripts developed under it are often directly incorporated into C++ applications. CERN is a big C++ user (I have a little experience with their GEANT4 framework), and being able to do everything in one language is a big productivity boost (see for example the rise of node.js for web-related work).


yeah... basically... in fact they're leaning more into JIT + string programming now and seeing big JIT latency etc. (imo is kinda ~approaching Julia from C++ side)


That is so incredibly cool!!!


On the performance side, pretty much the entire game logic would have to be implemented on the "script" side for this to be worth it compared to just embedding Lua and calling it a day, right?

I can see the benefit of typechecking the "script" code, though. The vulnerable parts are the "syscall" wrappers but they don't change all the time.


I don't know about the entire script. The game and game engine has tens of thousands of lines of code that is not script. But, I feel like it opens creative doors to be able to make (lets just say) a magnitude more calls into the virtual machine compared to other options. That's not to say that I am done with this research I am doing. The idea is to be able enter and leave the script as fast as possible, while also call back into the game engine as fast as possible. Once this is achieved, game logic that once had to rely on finite options can have script callbacks. Also the game will be naturally moddable from the start.

Here is an outtake from one of my game scripts: https://gist.github.com/fwsGonzo/d9116c46e5b7f8ed4743ab15f89...

While it may not look like it, flow.cpp calls back into the game engine in a surprising amount of places. It's also something that regularly gets called 10k times per simulation step if there is a cave under the water, for example.

For me right now, I feel like I have reached a sweet spot where I would no longer consider going back. Also, it's a fun project! Sometimes I am wondering if I am enjoying making the game engine more than the game. Oh well.

Might as well tack on to this that you can also choose your language. I have emulated Go, Rust and Nim pretty successfully. Enough to know that they would be a possible choice for emulation. Go is perhaps weird in that you want to avoid the C API (cgo) and just figure out the assembly instead, but that could also lock you to a Go version.


You ought to meet and discuss your project with Arnaud Hervas, the original author of the Shake compositing software. Shake was a VFX industry compositor that used GCC C++ as the "scripting language" that non-developers wrote. It was not exposed as C, but macro modified to look simpler. Arnaud did several industry firsts, including using the GPU to render compositing layers back when all that type of work was in software. His company before Apple bought them was called Nothing Real, https://en.wikipedia.org/wiki/Nothing_Real He's a nice guy, so don't be intimidated.


Let me just add that perhaps we can sandbox python too? LPython is on the frontpage now and if I read it right, it will compile to C and C++ when targeting AOT.

See: https://lpython.org/blog/2023/07/lpython-novel-fast-retarget...


Is performance really going to be a deciding factor for using this? I'd guess it depends what you really want in your "scripting" language. For me I'd want some sort of runtime with an easily readable language that can tolerate bad code, so in that case I would prefer something like lua.

Must admit though, things like this really make one think... what do people even want in a "scripting" language these days?


Probably depends how much of the game is implemented via "scripting", some games just put scripting at the edges (UI, data definition) while others do everything up from the game engine via "scripting" e.g. https://forums.unrealengine.com/t/ive-created-an-infinite-vo...

> The whole project s 100% blueprint with no C++ being used. I think this project is a great example for the power of blueprint.

And that's a post from 2016. These days you can probably create entire games via visual scripting (https://unity.com/features/unity-visual-scripting) as long as they're not too bespoke or demanding.


The example you use seems to be specifically a "because I can" project. We can't be sure (because the code isn't available, obviously), but I would highly doubt there are any AA or AAA games built entirely in the scripting engine. However, many 4X games get pretty close (Paradox series, Civilization, the Endless games, HoM&M, AoW, etc).


The limiting factor is not truly perf for visual scripting but everything around its ecosystem (diffing, architecture, spaghetti)


The most eye-catching title I’ve seen.


Visual Studio upgraded Edit And Continue to support live code updates. https://learn.microsoft.com/en-us/visualstudio/debugger/hot-...

If that’s not enough, there’s https://liveplusplus.tech/index.html


ChaiScript is another option: https://github.com/ChaiScript/ChaiScript


I'm making a Action MMO engine where the game will be coded in C and hot-reloaded with this API: http://edit.rupy.se?host=move.rupy.se&path=/file/game.cpp&sp...

So far the API is really small and the turnaroud is 100 millisec.


this is interesting.

i’m working on a high performance game in c++17 with clion and wickedengine on linux.

compiles are 20 seconds, which is fine. 1 second would be better.

blender exports are about 20 seconds too. 1 second would be better.


part _8_ .. oh that doesn't start well




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: