Hacker News new | past | comments | ask | show | jobs | submit login
My experience binding a couple of scripting engines with C++ (germandiagogomez.medium.com)
79 points by germandiago on May 24, 2021 | hide | past | favorite | 68 comments



> [lua] was discarded because of 3. It has unfamiliar syntax, but worse, unfamiliar semantics: no classes, use tables, start indexing at 1 and other oddities, just as being able to call functions with the wrong number of arguments and returning nil on the way. Also, use tables for both hash tables and arrays. It is powerful, do not misunderstand me, and Lua supports good concurrency. It was just not what I was looking for because of the mentioned things.

I can fully understand why lua is not a good fit for this case, however, I would like to add some color to the picture.

The most powerful way to for C(++) - lua interop, is not the official CAPI but luajit/FFI: https://luajit.org/ext_ffi.html

This allows for allocation of C objects on the heap and on the stack and FAST function calls. Doing the same for C++ is possible but requires some work e.g. http://lua-users.org/lists/lua-l/2011-07/msg00492.html

Furthermore:

- unfamiliar syntax -- The syntax is tiny -- and I found nothing unexpected about it.

- no classes -- There are many class libraries available for lua. Just pick one. Used penlight classes quite a bit, without running into major issues.

- use tables for both hash tables and arrays. -- Yes. This is on the API side, under the hood hashes and arrays are used where appropriate.


> [...] There are many class libraries available for lua. Just pick one. [...]

Alternatively, embrace the prototype-based programming and just... don't add classes.


In practice though, is prototype-based inheritance useful for anything other than implementing your own class system on top of it?


I hold an opinion that, if you use the prototype-based approach, you should avoid inheritance like the plague ;)


You should avoid it whether prototypes are involved or not.


Also,

> unfamiliar semantics: no classes, use tables, start indexing at 1 and other oddities, just as being able to call functions with the wrong number of arguments and returning nil on the way

Other than 1-based indexing, those semantics should be very familiar from JavaScript.


It still feels weird to me to call table an array, a hash table and do classes via metatables in very different ways. For sure it is powerful.

But it does not fullfill well the zero-friction I was looking for when integrating. You have to change a bit your mindset when integrating and using this stuff.

In Wren you have lists, which are basically arrays and dictionaries, ranges and all things I already know how to use from Python/C++.

That said, inheritance is not working smooth and ChaiScript does not even have inheritance itself. But for my purposes a class, concurrency and familiar data structures and patters was enough.


Javascript is not known for its intuitive and sane design, and certainly not from the perspective of C/C++ devs


The criterion was "familiar", which is nothing if not Javascript -- nobody said anything about intuitive or sane :)


I grumbled about indexing starting at 1, but once you get used to it it makes a lot of problems easier.

I've spent the last twenty-ish years in C and string manipulation just sucks. Think about it, you declare a buffer of length 20, the indexes are from 0->19, and the 19th byte needs to be a null if you are using it as a string and are using the entire buffer. Also, the standard library is not guaranteed to null terminate in all situations.

Lua's string indexing feels far more natural to me.


Small correction. The 20th byte (at index 19) needs to be null.

The mismatch with the English language and how people naturally count is definitely there and annoying. And yes, string manipulation in C is especially broken although I think the indexing is the smallest problem there.

However, it’s extremely natural when you think about it in terms of memory access. For example, in a 1-based indexing system, ptr[0] would point 1 character behind your pointer (weird) and ptr[-1] would point 2 back (wtf). Having the index map neatly to the offset makes a lot of sense to me. In fact, when I first started programming in VB6 20 years ago and only had a math background, the 1-based indexing was natural but I could never figure out why I had so many bugs related to array and string offsets.

I’ll also note that most programming languages are 0-based and interop with C is not really the goal (Java, JavaScript, Ruby, Python, etc). In fact, Python and Perl’s string manipulation is some of the best out there and they are 0 indexed.


Another way to look at it, in languages like Python that support slicing, is that the indices refer to the boundaries between the elements rather than the elements themselves.

     a b c d
    0 1 2 3 4
From this 4 character string you can slice s[1:3] and get 'b', 'c'. s[i:j] will always have j - i characters (ignoring negative indices). s[i] is as if it were short for s[i:i+1] e.g. s[3] is s[3:4] and that gives you 'd'.

Admittedly C++ doesn't have slicing so this is less relevant, but I think it's still an interesting aspect to the discussion (without getting bogged down in ideology).


It has with std::span


I figured that someone would pipe up about the 19th byte vs 19th index, and I'm glad. This is just semantics, but does index 0 represent the first byte or the zeroth byte?

I completely agree regarding memory access, but would argue that strings and memory should not be treated in the same way. Having the index represent the length - 1 has caused countless off-by-one bugs [1] that would not have been there in the first place if string indexes started with 1.

Java, JavaScript, Python, (maybe Ruby, I'm not fluent there) also bite the user if you attempt index data outside of the string's range. C will happily index whatever you want, and these bugs can often remain hidden for decades.

1. https://cwe.mitre.org/data/definitions/193.html


Index N represents "skip N elements". That gives you very easy additive/subtractive behaviour, unlike "how many numbers are there from 7 to 17?" scenarios.

Sure, 17-7 gives you 10, but that's not the final answer, you have to add 1 to get the right answer, 11. Sorry, no, you actually subtract 1 and the right answer is 9. Wait, no: 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, okay, 11 in total, so you have to add 1, got it right the first time.


The indexing starting at 1 is something that has always annoyed me. I just haven't seen any other language make that decision, and I'm not sure the reasoning for it really outweighs the fact every other language just goes with 0 due to the origin of it.

Lua is a great language for sure though.


Pascal did, but I believe that was because index 0 contained an 8-bit length. There was no need for null termination, and strings were limited to 255 bytes.


Fortran, Matlab, and Julia all start arrays at 1, either always (Matlab) or by default (Fortran, Julia).


Erlang indexes starting at one, except when it doesn't.

Tuples are 1 indexed, lists are 1 indexed (in the standard library, anyway), but binaries are zero indexed.


This. The insistence on using 0-based string offsets is purely a C thing (where it makes sense) inherited on to languages that wanted to stay close to C or appeal to C devs (even though it does not make sense). An easy way to check is looking into awk which, as a DSL for string manipulation written itself in C, deliberately uses 1-based string offsets, and where many/most common string expressions collapse to a very compact form, which makes even more sense because empty string results are interpreted as false in conditions.


It's not a C thing, it's a math/utility thing.

Dijkstra: Why numbering should start at zero https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/E...


Which is funny because Lua got it’s 1-base from FORTRAN which, I believe, adopted it to make TRANslating math FORmulas easier.


I agree in part that it's a C thing. More accurately it's an offset from the array base thing.

The first element is at index 0 because its address is base + 0 * sizeof(element)

The second element is at index 1 because its address is base + 1 * sizeof(element)


Lua is the C of scripting languages.


Developers working with C-like languages might not like Lua because of its different looks and behaviour. I was in the same situation and started looking into Squirrel because of that. But eventually I went back to Lua, for one important reason: The ecosystem. Lua has a lot of adjacent tools and a huge community. If you want to do something in Squirrel, you will be much more on your own, which can be frustrating. Lots of Lua's quirks can be easily worked around, but lack of a community supporting the language can't. This is especially important if you want to open your scripting API to users of your software.


I think what you say is true. Anyway, I do not need state-of-the-art technology in my case. Basically I really wanted, and I think (in order of importance, but being the two former points the most important with a big difference), this:

1. Concurrency support

2. Easy to bind into C++

3. Familiarity, etc.

Namely, if only Lua had existed and provided that Sol2 exists and makes it bind it to C++ easy, I would have chosen that. But since Wren + WrenBind17 existed, Wren had more familiar syntax and it was a viable choice, I went for that. I was trying to find the past of least resistance (lower learning curve, easier to bind, concurrency making my code easier, since I am familiar already with most patterns)

As for ChaiScript, it was the first thing I took since it was so easy to embed. But it had its own problems: lack of concurrency and it does not point the file and line of errors, which is very painful because it drops your productivity.

And scripting... scripting is about productivity, at least that is what I was using it for.


JavaScript would definitely be my first choice if I had to integrate another scripting language into a native program today (doubly so if the eventual target is web, like the author implies with WebAssembly support).

Depending on your needs (small vs. fast, low memory usage vs. full JIT, etc.) you can pick anything from JerryScript / QuickJS all the way to V8.

Perhaps the only thing missing is a universal C/C++ API for embedding JavaScript engines that lets you swap out easily to test different trade-offs.


How well established is JavaScript FFI support?


I kind of expected to see QuickJS or Duktape here; if the author considers ‘something like javascript-ey’ to be just fine, he might as well have used JavaScript itself, with all its strengths and faults.


Well. I think I was a bit inaccurate. When I said javascript-ey what I meant is also familiar. Something like Squirrel and Wren do a good job.

The ones you mentioned, as far as I investigated, were not dead-easy to integrate into C++, one of the top requirements.

Take into account that I have to expose my own types, not just ints and basic types.

The original API was coded in a natural C++ way. APIs that wrap well in that sense could be Sol2 for Lua, Chaiscript, Wrenbind... which integrate with custom types and smart pointers.

With other scripting languages and their libraries you need additional work


Duktape’s host API is pretty much a ripoff of Lua’s, and the latter wasn’t rejected on those grounds, so…

QuickJS’s API isn’t particularly well-documented, but it’s not hard to find your way around it either if you dig into the source (the engine is very hackable, too; you might even fix some of the language’s design flaws – obligatory wat talk reference – if you’re so inclined). The host API follows the CPython model, with objects represented by pointers and explicit reference counting on the C side. There are some predefined macros to ease defining built-in classes. Some type-level hackery in C++ might ease things even further. I don’t know how much deader-easier you want it.


Take a look at how pybind11, wrenbind17, sol2 or Chaiscript do it. That is how easy I want it: I can expose custom types and global state easily. I do not want just ints and const char */double.

These bindings do from decent to great.


Constructive feedback for the article is welcome. Thanks!


Did you have a look at e.g. https://root.cern.ch/root/html534/guides/users-guide/CINT.ht... and https://root.cern/cling/?

> Lua ... It has unfamiliar syntax ... start indexing at 1

Syntax is not unfamiliar, just more Pascal like; if you use LuaJIT you can use zero based indices and a powerful FFI for direct C code integration.


Well, by this I mean "unfamiliar to me", of course. Lol.

Actually Lua is something to consider from the point of view of usage: it is an industry standard actually. However, all those small quirks in semantics... and classes can be done in many ways (that is what I understand, via metatables)...

In ChaiSCript or Wren there is one true way and you are done. You might like it or not, but it leads to less confusion, especially if you use most of the time what is in the mainstream.

This is by no means a bad thing in itself, it is just about how ergonomic or time-consuming it could be for myself: I just feel more comfortable with ChaiScript, Wren or Squirrel than with Lua. Even AngelsCript is also more similar to what you already have. So when exposing APIs there is much less friction.

Truth to be told, there is also https://github.com/ThePhD/sol2 which looks great and something to consider. It makes binding things quite easier and gives you object-oriented Lua. You could rely on that.

It was just my subjective choice. There is no 100% right choice. Probably, if I found people that are comfortable with Lua I would use that. But the case is that this is a project of mine as it stands now.


As for looking at CINT, CLing, yes I did. I prefer to use a dynamic language with coroutines out of the box. It is actually what I was looking for besides ease of binding it and "familiarity" in semantics/syntax in a broad, imprecise way I defined for myself.


> if you use LuaJIT you can use zero based indices and a powerful FFI

Not entirely, Lua standard libraries still expect everything to be one-indexed, while FFI structures are zero indexed. So with LuaJIT you often end up with a mix of 0 and 1 indexed code, which in my experience was workable but definitely a pain point.


You should avoid the Lua C API in LuaJIT because it is not supported by the JIT (i.e. it makes your code running in the interpreter instead of the JIT). Using zero based indices in Lua code running on LuaJIT works well.


I'm not talking about the Lua C API, but things as simple as this:

    > stuff = {"one", "two", "three"}
    > stuff[1]
    one
These native Lua structures are baked in, and they're a lot more flexible than FFI structures - presumably if you're using Lua, it's because you want to take advantage of that flexibility and those affordances. FWIW I've written a lot of LuaJIT, and usually I kept the lower-level FFI stuff separate from the higher-level code using Lua data structures, so I rarely encountered that discrepancy between them, but still something to keep in mind.


You can e.g. do

  > stuff = { [0]="one", [1]="two" }
  > print(stuff[0])
  one
Works well; I wrote e.g. https://github.com/rochus-keller/Smalltalk#a-smalltalk-80-in... that way.

EDIT: even this works

  > stuff = { [0]="one", "two", "three" }
  > print(stuff[0]) -> one
  > print(stuff[1]) -> two


> stuff = { [0]="one", "two", "three" }

In this case, is it possible to make iteration start with the element at index 0? Maybe by implementing a custom version of `ipairs`?


When using

  > stuff = { [0]="one", "two", "three" }
  > for k,v in pairs(stuff) do print(v) end
it prints all three elements in the correct order. I rarely use iterators for performance reasons anyway. Instead of ipairs one can use

  > for i=0,#stuff do print(stuff[i]) end


> Syntax is not unfamiliar, just more Pascal like;

how is that not unfamiliar


In that it still refers to a hugely popular family of languages...


Chinese is also a hugely popular language, does not mean that it is familiar to a large amount of humans


The analogy breaks as Pascal knowledge is not confined to one geographical area or ethnicity. The same for languages inspired by Pascal syntax, with are tons.

No matter how you slice it or dice it, Pascal and Pascal-like syntax are not some obscure niche languages...


> The analogy breaks as Pascal knowledge is not confined to one geographical area or ethnicity.

no, it is confined to people who learned Pascal ? I could start learning chinese with duolingo just as easily as I could start learning pascal with some web tutorial, it does not mean that either are familiar to me.


Well, why would you then consider Python to be familiar?


If coming from C++ I definitely wouldn't, especially the module system and reference binding in python is WEIRD. I'd say C, C#, maybe D and Java would fit ? My criterion would be "can a new grad student who only learned c++ be productive in a couple days"


There may be a difference between your view and that of the majority of developers. My primary language is also C++, but languages of the Pascal family (to which Python is related) remain very popular.


Because Python is 100000x more popular than Pascal in 2021?


Python syntax is more similar to Pascal than to e.g. Java or JS.

"Modula-3 is the origin of the syntax and semantics used for exceptions, and some other Python features." (from https://docs.python.org/3/faq/general.html#why-was-python-cr...). Also the predecessor languages ABC and SETL were in the Algol tradition.

> Python is 100000x more popular than Pascal

It's about factor 8 on https://www.tiobe.com/tiobe-index/ or factor 3 (in score) on https://spectrum.ieee.org/static/interactive-the-top-program..., whatever you prefer as a reference. Delphi (which is Object Pascal) is still a widely used language.


I generally go by job listings. It's trivial to find Python jobs, Pascal/Delphi jobs are very rare.


Maybe you can post a link to a job site where there are 100000x more Python than Pascal/Delphi/Ada jobs. Btw. you can filter the IEEE ranking by jobs which seems to correspond well with what I see on monster or indeed. Anyway, the discussion was about whether the Lua or Pascal style syntax is unfamiliar or not.


You make a good point. I started my Computer Science and Engineering degree (I am european, not american, so the equivalent looks like kind of a merge of both areas) with Python, C and C++ on the programming side.

Pascal was discarded a few years back in my university. And yes, by familiar I mean exactly what you mean: you see nowadays Java, Python, C++, C, C#, but Pascal is disappeared.

Disappeared since long ago since I do not know even the syntax myself by casual reading around.


> Prefer dynamic to static typing, since static typing can remove the coding speed: it makes you think about types.

I'm wondering what you mean exactly by this. Do you not think about types when programming with a language with dynamic typing?


No, I think I expressed myself the wrong way.

What I mean is that if you have to annotate all your code with types (like in AngelScript), this will slow you down for two reasons. First, you need to think about types, and second, refactoring is more rigid.

If it is optional, it is ok, you can take advantage of it at will (ChaiScript supports types in parameters, but optionally).


I'm late to the party here, but just wanted to add that AngelScript added support for auto declaration like in C++11 some time ago.

https://www.angelcode.com/angelscript/sdk/docs/manual/doc_da...


Thanks, that clarifies it and makes sense.


> Python [...] could be difficult to port to Web Assembly down the road

The Pyodide project has already compiled CPython to WebAssembly - why is that a worse solution than compiling one of these other scripting language interpreters to WASM?


While it "works" python under WASM means downloading a very large interpreter and runtime and waiting quite a long time for it to start up.

On my i5 laptop, this demo downloads about 8Mb and takes a couple of seconds to load up: http://karay.me/truepyxel/demo.html

Lua, by comparison, is very small and has a fast startup under WASM.


One issue could be size - CPython's native binary is an order of magnitude larger than Lua's, and the same is probably true when using WASM.

Perhaps something like MicroPython could solve that, though.



Indeed, Python is one of the most well behaved scripting languages for WebAssembly, and people were running in browser for a good while already with WebAsembly predecessors (emscripten and asm.js).


This was not my information at the time. But thanks for the info, it is helpful.

With https://github.com/pybind/pybind11 there is really great integration with C++ and Python is my second home after C++ actually.

Anyway, I am quite happy with Wren and it seems to be fast (not a requirement for my project, though)


For a C/C++ like syntax there is also https://github.com/mingodad/ljs and ljsjit and also https://github.com/mingodad/squilu a variant of Squirrel with more C/C++/Java/CSharp/Javascript syntax.


I do wonder why PHP was not included on that list? It offers an embed SAPI module - which some adventurous person even managed to bind to NodeJS: https://www.npmjs.com/package/php-embed

Only thing I can imagine is that it's not widely known and even more less documented.


My game is not a web-first game. I do not want to put stones on its way to WebAssembly but I was not looking for web-first tech, just a small embeddable language to complement its C++ codebase.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: