It is extremely likely that that the author did not have option to look at alternatives to Python, but some other scripting languages, for example, Lua, Tcl and Guile are easier to embed because they were designed with that use in mind from the beginning.
Among other things, Tcl, Lua (not sure of latest Guile) encapsulate their interpreter thread well, so one can run independent interpreters in multiple threads that can communicate with each other by message passing but without the need for any serialization/deserialization.
If nothing else, I would imagine using Lua or Tcl instead of Python would make the binary size significantly smaller yeah? I seem to have gotten the impression that embedding the CPython runtime, even stripped down, could potentially grow binaries significantly, especially if statically linked.
My rule of thumb is the Lua core is about 100kb, and the Lua standard library is about 100kb. Tcl is about 2MB. (All built as dynamic libraries.)
Lua’s source is pretty malleable. There are things you can rip out from Lua if you don’t need them to make it smaller. I once read a claim that a micro-controller use got it under 20kb.
I prefer to embed a language that I enjoy programming in. I don't know about Tcl, but I enjoy using Python more than Lua. Python isn't that unusual of a choice for embedding either; eg: take a look at LLDB or Sublime Text.
Threading and reference counting may be some ugly aspects of embedding CPython sure, although I don't think manipulating the stack in Lua is that praiseworthy either.
I’ll stick my neck out there and praise Lua. While I will admit the stack is tedious to work with, all C binding bridges I’ve worked with are tedious. (With exceptions of languages that use compilers and architecture/platform specific tricks to pull off transparent bridging. LuaJIT and and LuaFFI would be one example. Swift would be another.)
Lua’s design is at least very clean and avoids allowing developers from accidentally creating nasty lifecycle design issues. It is very clear with Lua about ownership and visibility, which propagates cleanly through all aspects of Lua, such as working seamlessly with Lua’s garbage collection system. The design+code is also 100% portable to any processor/platform that has a C compiler (i.e. not requiring platform/arch specific #ifdefs, which is really useful when bringing up on a lesser-used platforms/chips.) And the design has very clear performance implications. In contrast, there was a recent talk at Google I/O about Android Dalvik vs. ARTs with respect to JNI bindings and performance. The results were non-obvious to say the least.
I have embedded Ruby and Python before so I found that paper interesting to read. Wish there were mentions of languages I'm unfamiliar with (eg: squirrel, javascript). Anyway, I appreciate the design decisions and no doubt Lua has merits, but my personal experience is I haven't found it to be a pleasant language to use or embed.
Garbage collection is a controversial topic too, and if I understand correctly, a part of the reason Lua has a stack based API (with how its GC works). I somewhat liked the level of control I had with reference counting from working with CPython. Some applications may even go out of their way to disable the GC and use weak references to guarantee determinism similar to Apple's ARC. Squirrel, born as a competitor to Lua, looks to use a GC approach similar to Python at a glance..
Among other things, Tcl, Lua (not sure of latest Guile) encapsulate their interpreter thread well, so one can run independent interpreters in multiple threads that can communicate with each other by message passing but without the need for any serialization/deserialization.