Hacker News new | past | comments | ask | show | jobs | submit login
Writing Python inside Rust (m-ou.se)
187 points by mcp_ on April 17, 2020 | hide | past | favorite | 85 comments



I know this project may just be for fun, but with WASM targets for all sorts of languages, I'm hoping we get to a future where mixing and matching different languages for different parts of your program will be seamless. Imagine starting a project in an easy language, then migrating pieces to a faster "bare metal" language as needed in a super piecemeal way. Same with moving pieces to a safer language as the project grows, slowly expanding the boundaries of the safe bits as appropriate.



>Just like the JVM and CLR I guess. (List_of_JVM_languages, List_of_CLI_languages)

I don't see JVM & CLR even with the dozens of language choices on top of the virtual machine runtimes as giving you ways to do "bare metal" programming. I guess one could arguably squint and say Java's JNI or C# P/Invoke and "unsafe{}" gets you some "bare metal" capabilities but that's a stretch.

I go back to the gp's sentence: "Imagine starting a project in an easy language, then migrating pieces to a faster "bare metal" language as needed in a super piecemeal way."

For example, JVM/Java doesn't have value types[1] (yet) and that's required for programmer's control of exact and efficient memory layout for many bare metal domains. It doesn't matter what flavor of JVM language you use because you're ultimately limited by the JVM's capabilities which causes excessive pointers to pointers which is inappropriate for some high-performance code.

In C#, some future tech like CoreRT might open up some more "bare metal" programming possibilities but that's not production yet[2]. I remember you couldn't even develop a Windows Explorer right-click menu shell extensions in C# because it was a bad idea to load up 2 different versions of the NET Framework runtimes (until NET 4.0). That's not even that low a level of programming and yet C++ had no such limitation.

I use C# as much as I can but I still have to do 50% my projects in C++ because JVM/CLR are not "bare metal" enough.

[1] https://en.wikipedia.org/wiki/Criticism_of_Java#Compound_val...

[2] https://github.com/dotnet/corert#user-content-net-core-runti...


Since when is WebAssembly bare metal?!?

Apparently you missed the Windows 8, 8.1 and 10 train regarding your "bare metal" compilation from C# and VB.NET code.

.NET 4.0 was released in 2008 and I have been able to use C++ as .NET language since 2001, so ....

In fact one of my first .NET projects, in 2002, was to integrate C++ RPC library into Managed C++, long replaced replaced by C++/CLI with .NET 2.0 release.

Going back to Java, IBM, Aicas, PTC, Gemalto will happily sell you Java compilers that generate AOT native code for embedded deployment targets, not to mention what is running on my phone, where 95% of the OS APIs are exposed only via Java and where your beloved C and C++ code needs to use JNI to access them.


>Since when is WebAssembly bare metal?!?

Not sure what that is referring to.

>C++ as .NET language

We seem to be talking about 2 different things. Using "C++" style syntax (C++/CLI) in a managed language with GC is not what many systems programmers call "bare metal". I thought it was clear from context that my mention of C++ is traditional "real C++" such as gcc/clang/MSVC and not the C++/CLI.

>Going back to Java, IBM, Aicas, PTC, Gemalto will happily sell you Java compilers that generate AOT native code

But that's not the JVM runtime though. The JVM is what you originally wrote and that's the scope of what I was replying to: ("Just like the JVM and CLR I guess. (List_of_JVM_languages ...)")

The _JVM_ doesn't really give you low-level bare-metal programming and it doesn't matter what flavor of JVM-language you choose to run on top of it.

And at the risk of further muddying up the discussion with the tangent subject of Java AOT... do any of those Java compilers give true value type semantics or is still references with pointer chasing? I'm not familiar with those compilers.

>and where your beloved C and C++ code

"beloved"?!? Can we tone down this down a bit? I'm just trying to clarify that JVM & CLR really don't span the entire spectrum of programming all the way down to "bare metal" in the way low-level programmers are typically using that phrase. I thought I was making a neutral and factual statement. I.e. I'm not interested in an emotional flamewar.


If I had to guess, I'd say OP meant that just like CLR or the JVM, WebAssembly couldn't be used to target bare metal and since Wasm is discussed in the original post, there's no practical difference between it and CLR/JVM for language intermixing at present - it's just another "common language runtime" if you will.

But I agree that the tone used wasn't the best.


Maybe the tone wasn't be best one, but I would like to know how WebAssembly is bare metal programming to start with.

Also would like to know in what way compiling natively to machine code has to do with having value type semantics on the source language.


>, but I would like to know how WebAssembly is bare metal programming to start with.

We are getting into "splitting hairs" territory but let me attempt to untangle this thread because it seems to be hung up on what "bare metal" means.

Yes, if we're using "bare metal" to only mean real semiconductor chip, WASM is not that. It's an abstract virtual machine. So yes, in that strict sense, WASM is analogous to JVM and CLR.

But.....

I'm charitably interpreting gp's comment (6gvONxR4sf7o) and he's using WASM as his _relative_ (not absolute) perspective of _that_ being "bare metal". Ok, if we play along with that, WASM is not analogous the JVM/CLR because it is lower level[1]. Thus a non-managed language like C++ can more easily target WASM-flavor-of-bare-metal for high performance rather than a managed-C++/CLI targeting .NET CLR.

Yes, it's a subtle difference. WASM is more "bare-metal-ish" than JVM ... relatively speaking. I just don't think JVM languages can really do the same thing as WASM as the Google/Mozilla/Apple/MS specifically engineered Web Assembly to be a target for low-level-bare-metal languages like C/C++. In contrast, Sun & James Gosling deliberately didn't engineer JVM Java byte code to be a compilation target for low-level C/C++.

This means something cpu-intensive like AutoCAD or possibly Adobe Premiere Pro can hypothetically be written to target WASM and will perform better than if those apps were re-written in Java to target a Java web browser plugin. E.g. Java's JVM doesn't have value types and that architecture choice is very unfriendly to storing/manipulating millions of 3d points for a CAD program. In contrast, WASM's architecture opens up a few more "bare-metal-ish" programming domains.

The various choices of JVM languages like Kotlin/Clojure/JRuby/etc actually don't address what WASM is attempting to accomplish.

[1] https://www.quora.com/How-does-Java-bytecode-compare-to-WASM...


Yet, GraalVM compiles LLVM bitcode and WASM bytecode just fine.

https://www.graalvm.org/docs/reference-manual/languages/llvm...

https://www.graalvm.org/docs/reference-manual/languages/wasm...

A JVM and respective JIT compiler all written in Java.

And I still doesn't understand what WASM does for C++ that CLR doesn't do, given that I can write straight C89 or C++ with C++/CLI, just like using gcc or clang doesn't force me to use their language extensions.


Same idea, yeah, but hopefully with better common adoption. For various reasons users didn't adopt those (e.g. numpy on JVM would be a ton of effort, especially a decade ago). But the web is an irresistable force, maybe. I'm hopeful (though not necessarily optimistic).


But they did, just not with the languages that are loved on HN as "taking over the world".

Most people on JVM land, use a mix of Java, Kotlin, Scala, Clojure, Groovy, JRuby.

Whereas on CLR land it is C#, F#, VB.NET and C++/CLI (for low level stuff).

Naturally those that want to kill Java, or rather not touch Windows, aren't aware of this.


Every single language I've seen that has been 'loved on HN as "taking over the world"' has had C FFI interop, whether it is Node, Rust, Python, Ruby, C#, Java, Haskell, Zig, Crystal, Julia, LUA, etc.

Web assembly is the browser C FFI, not some high level platform like Java or .Net. Your examples aren't comparable.


Failure to understand what bytecode based execution runtimes is all about it seems.

Also failure to understand that C ABI does not exist, rather it is the OS ABI from OS written in C, and that other OS, not written in C, don't have such thing as C ABI across all languages.

Examples of such OSes, IBM i, z/OS, Unisys ClearPath, UCSD, Unisys ClearPath, Classic Mac OS, UCSD, Native Oberon, Mesa/Cedar, Windows (plenty of stuff is .NET/COM/UWP nowadays), Android, ChromeOS, Garmin OS, and the Web.

So no, it isn't the browser C FFI, all major ones aren't even written in C for the past 20 years.


You have repeatedly missed the point of the discussion because you are hyper-focused on the implementation details that are irrelevant to the end user. I honestly don't know how you jumped from the JVM and CLR to Unisys Clearpath (twice) and Mesa/Cedar except as a red herring. The topic of discussion is Web Assembly which implies a modern browser.

The majority of browsers now support Web Assembly and about half the global population has a web browser and access to the internet - and now, access to an actual universal bytecode based execution runtime by nature of being part of the web browser standards instead of an OS feature or framework installation or (god forbid) Oracle TOS.

The C FFI part was an analogy. The whole point of Web Assembly is that it can't call out to just any library on the OS.


Except it can, because WebAssembly has long stop being a browser only story.

By the way, JVM on the browser, Flash CrossBridge and PNaCL were there first in what concerns "universal bytecode based execution runtime by nature of being part of the web browser".


I know I haven't worked with Java for the past 6 years (time flies!), but back then none of the 2 companies I worked for used anything else than Java on the JVM. In fact, until Kotlin came I only met one guy working with Scala, and nobody for the other languages. It was in France though,not in the US.

But I think your argument holds for CLR, were I've seen the 4 languages you mention being used altogether in the same code base.


I know you know this, but I’ll write it for other readers: there is even more languages on CLR if we count the one not supported by MS. Also, C native libraries and thus any language that can target it, can by used as well on Windows, Mac OS and Linux with PInvoke.


Hence why I posted a link to the language list on my first comment. :)


The problem is that inherently a lot of the times when you want to drop down to C/C++, you're doing numerical code that would really benefit from SIMD operations and many of those tend to not be portable (as far as I'm aware, basic SIMD is just in the proposal stage for WASM).


Controlling memory layout of your data structures really goes a long way for tight loop performance. SIMD might be just the icing on top of the cake.


I wonder if we'll ever get to a future where all code is write-only, immutable, and thrown away after compilation.

The choice of the language would become a matter of presenting the compiled code and writing replacements for new code.

The runtime and the type system can't be replaced this way though.


I can't tell if you're joking or not. Why would I delete the source code of every program after writing it?


I mean, delete the source, but store an intermediate representation that can be decompiled into any other language for reading.

Languages with good type systems and tooling support a workflow where you mostly rely on Intellisense hints and docs, and never read the code itself.

That, but taken to its logical conclusion: if you never read the source code, you don't need to store the source code.

The missing part is the ability to define algorithms as modifications of existing algorithms.


So, at first I thought this approach is doomed, because you lose comments, and what looks like a comment to Rust may not look like one to Python. Example:

  x = y // 2 # floor division
But then I decided to look at the docs:

https://doc.rust-lang.org/proc_macro/struct.Span.html

And I noticed source_text, which "preserves the original source code, including spaces and comments"!!!

Why not just use this from the start then?? Seems like the easy way out, no?

(Disclaimer: I don't know Rust, can't even write hello world.)


The author likely just barely missed its introduction. While the article is written recently, the implementation it talks about was published first in early April 2019, right about when source_text was first introduced into nightly.


I am the one who contributed Span::source_text to rust. My motivation for doing that was exactly the same as the author of the blog post, and was to help implementing the cpp! macro which embeds C++ within rust just like this python! macro. https://docs.rs/cpp/ However, I still can't make use of it today because it is not stable yet, and even proc_macro2 does not implement it (and this is not trivial to implement there)


Wow. The fact that you care about this being stable implies that you plan to use this "for real"; the fact that you actually implemented this implies you care enough to put in some real work. So you must have a use case in mind, and I am not imaginative enough to guess what it might be. Is it just about convenience so you don't need to bother with putting your C++ code in a different file? Or is there more to it?


I'm using this to write Qt bindings: https://github.com/woboq/qmetaobject-rs/

> Is it just about convenience so you don't need to bother with putting your C++ code in a different file?

Yes, mostly. I find that having the code in place makes a big difference. I do not like useless levels of indirection and context switches while coding. This way is much better then having to edit three files (the .cpp, the ffi module, and the caller) each time I want to do a call into C++ while making sure they are in sync.


That actually makes sense! Cool!


Using Span to get the token location is exactly what I needed! Thanks. I wrote a blog post[0] the other day about making a css macro that compiles into rust for the use in Yew, a front end react like framework.

> However, in rust, there’s no way to differentiate .a.b with .a .b

Now I know that the above is incorrect. I would have never thought of spans so I thank you again

[0] https://conradludgate.com/posts/yew-css/#what-are-the-downsi...


The blog post shows two Python snippets starting with "if True:" and different indentation and says the snippets "have a different meaning". However, in this case the difference between the snippets is mainly in their syntax and not in their meaning. The example would have been better if "if True:" was replaced by "if False:" or "if foo:".

        if True:
            x()
        y()


        if True:
            x()
            y()


Good point! Updated.


What is the use case for embedding Python code in a Rust program?


There could be no reasonable use case for something like this, and yet it would still have artistic value and this would still be an interesting article.

It's a brilliant hack and we are on Hacker News after all.


That's exactly what I thought when I read the title. It's very frustrating how many landing pages, project readmes, blog posts, etc that don't answer the question "why?". I usually need to know that before going further when I come across something that I've never heard of before. If they don't have the "why?" somewhere easy to find I just close the tab.


You know, that might not be unintentional. They might want those who don't see the value to just close the tab and move on. That way only those who see the value keep reading. I imagine it helps avoid having to engage in (often endless) arguments about the validity of the use case.


I am quite surprised by your message. This is HN and even before or outside of that, for years I have myself dived into totally random projects simply because "why not?".

To try random ideas is part of learning and when you have something to share, just do that. There will always be like minded someone to pick up.


Whenever I see someone ask these "whys", I have the urge to show them this.

https://www.youtube.com/watch?v=Y4hOIgRPlNU


It's a bit of a guilty pleasure for me to read these kinds of weird things with no practical application. It's just a little refreshing to read about something technically possibly, but frivolous (no offense meant to the OP, of course) just because it can be done.


"Because it can be done" is a very solid answer to "Why?"


It helps me to step outside my own context and view a familiar use case, problem, or solution with an unfamiliar perspective.


Because it was there.


I’m just throwing out ideas but what if I wanted to take working python code and convert it to rust? Could I use this to start as the baseline and start replacing bits of it and see that it still produces the same output?


There was a recent ATP.fm podcast with Chris Lattner of Swift fame. He talked about the topic of embedding Python code in Swift for TensorFlow and it overlaps with your suggestion.


Swift for Tensorflow has implemented Python interoperability, rather than embedded Python code as in this article. A Swift program can import a Python module and interact with Python objects and functions as if they were native Swift objects (minus strong typing, of course). See https://github.com/tensorflow/swift/blob/master/docs/PythonI...


Rust has had Python interop through the Python C API since around 2015 [1]. It's pretty low hanging fruit for any language that supports the C FFI. Rust-cpython has had simple macros for interop for ages [2] and there's even a library that uses serde macros to encode/decode pickled object [3].

This article is just a fun hack using Rust macros.

[1] https://github.com/dgrunwald/rust-cpython

[2] http://dgrunwald.github.io/rust-cpython/doc/cpython/macro.py...

[3] https://docs.rs/serde-pickle/0.6.0/serde_pickle/


A friend of mine has a similar project called PyOxidizer that's intended for building standalone executables with Python+Rust: https://pyoxidizer.readthedocs.io/en/stable/

His primary use for it is building distributable binaries for Mercurial, which is written primarily in Python.


Plugins for example? Game scripts? Think of use cases for embedding Lua into a C program.


Maybe interfacing w an ml lib like pytorch or something?


I suppose you could, but I believe that there are direct bindings from Rust to Pytorch at https://github.com/LaurentMazare/tch-rs. I haven't used it personally, but I've heard fairly good things about it's correctness and responsive maintainer.


I have written a transformer-based (BERT, XLM-RoBERTa) sequence labeler/lemmatizer/dependency parser in Rust with tch-rs [1] and it is great! Some parts do not feel rusty. E.g. dynamic typing of tensors, you get a Tensor rather than Tensor<f32>, but that's more due to how libtorch itself works. But it's a very straightforward API that exposes most of libtorch. The maintainer is also very nice and responsive.

I was also amazed how much is implemented in libtorch itself as opposed to the Python wrapper, which makes much of the Torch functionality available to other languages.

[1] https://github.com/stickeritis/sticker2 https://github.com/stickeritis/sticker-transformers


Someone might have jumped that hurdle but the point is it circumvents that class of hurdle. What about the next framework which doesn't have bindings? What about leveraging existing Pytorch code that's written in Python?


For pytorch it would probably be easier to do something based off the pytorch C++ interface.


Sure, I've never even used pytorch. But I'm just saying python has a fair bit of data analysis and ml tools that probably haven't found native homes in rust


I only had a quick scan through the article, but could this be used to create an executable for some existing Python code?


What would the point of that be versus wrapping it in a bash script? Obfuscation?


Python is missing a nice simple way to ship native executables. If you have to send people a package to unpack that's an extra step. In theory you can use freeze, but configuring it all is a pain.


I've only used it for small GUIs (a couple of files & dependencies) but `pyinstaller --onefile` is about as easy as it gets.


You can use pip to install the executable source files. Pip is preinstalled on most Libux distributions and MacOS.


If imports worked I could see it being very useful for making graphs. I've heard of people serializing data in other languages and then using Python to plot it.


Imports work fine. The main reason I wrote this, was to be able to use the Python matplotlib library in Rust:

https://twitter.com/m_ou_se/status/1120577172438233088


I can confirm this--Rust's data visualization and plotting libraries are still pretty early stage. You can generally do 2D bar charts and scatter plots pretty well, but once you start jumping into more complex representations or three dimensions, you start running into some serious barriers.


At FOSDEM there was a talk about boosting Python with Rust [1]. Might be interesting for the people checking this.

[1] https://fosdem.org/2020/schedule/event/python2020_rust/


Is it common to embed one language's code directly inside another language like this?

Lua is often used tightly-coupled to C, and it doesn't have the significant-whitespace issues that Python shows here. Even so, I've only ever seen it used in separate `.lua` files, never embedded directly within `.c` files.


I don’t think it’s common nor it’s a best practice as you can’t apply static analysis on your embedded code. But it’s really cool Rust allows that and it can be a ad-hoc solution when you need to write a piece of code fast inside of rust


While I've seen people doing this, I've never done it. I've done the opposite in a small open source project of mine(arguably abandoned at this point due to lack of time and contributors). That said, there is a huge potential here - your code may require some small operation that is cheap to execute but would take you 2 minutes to write in python while it would take you an hour of doing the same in rust. I can think of some use cases for this. Nice article!


Why call python from Rust? I think calling rust from python would make more sense. Use it to optimize python functions like how python libraries do with with C (i.e. numpy)


Suppose I’m building an application in Rust that processes a lot of data, one of the steps in processing that data I maybe want to run through a Python tool like SpaCy or Flair.

How would I go about doing that? I could put the Python code behind a little http API and call it that way, but that’s a bunch of overhead and extra stuff to maintain just to analyse some text. If I embed said Python tools in my Rust code then I can call those tools with significantly less overhead and complexity.


Why does it have to have a little http API? Why not just spawn the python tool?


And communicate with it how?

If you’re suggesting spawning things over shell/cmd line I’m of the opinion that this a generally bad idea.


Just fork and exec. Or if you fancy, posix_spawn a separate process. Then communicate over pipes, or if you have a lot of outputs from these tools, collect results from the file system.

Really, where does this fear of spawning things come from?


Python module loading tends to be quite slow. Loading Tensorflow on our Heroku production environment takes 10 seconds (5 on my laptop). A client of mine running Rust and Python on Embedded Linux device found that loading numpy/pandas modules took over 5 seconds (laptop was under 1), and their computation took just x00 milliseconds. 10x overhead...

So the spawning of a short-lived subprocess approach has massive overhead, only suitable for multi-second workloads.


What's wrong with forking and using a pipe or socket?


Spawning an external binary does not necessarily involve the shell (if it did, how could the shell itself spawn things? It’d be shells all the way down)


> And communicate with it how?

The simplest approach is via a pipe to the process' stdin/stdout I guess. Of course you have to (de)serialize your data, but you would have to to the same if you go the HTTP route, which seems far more complex. Furthermore the suggested solution probably has a nice wrapper in the language's stdlib (e.g. `check_output()` in Python land).


Ok, at that point, why not just write the Python directly in your Rust application?


Because the code smell reeks? You've greatly increased complexity. Did you read about debugging the whitespace? What a mess.


I wrote this [0] up a couple of weeks ago, which is a hack to put shell code in your rust, with reasons why you would want to.

[0]: https://neosmart.net/blog/2020/self-compiling-rust-code/


I'm having issues seeing the whole post, when I scroll down i just see white, and when I click the subject headers from the menu it jumps down half the page and the mouse wheel scroll gets locked. I'm using Chrome on MacOS.


Lua is a better fit for Rust, because it doesn't have the threading restrictions of (C)Python. With Lua you can run a thousand parallel vms in the same process if you want.

But the macro hacks are impressive!


So I am not the only one with questions regarding its use case? I cant seem to conjure any.


Quickly introducing battle-tested data science frameworks, snippets and patterns without having to replicate them in Rust.


There's a typo where stringify! is referred to as strinfigy!

Cool article btw!


Thanks! Updated.


Am I the only person having trouble reading this blog? That color scheme makes my eyes bleed.

It seems like it's a light gray background with a slightly (not much) darker gray text. The contrast and thin font weight is terrible.

And the pink is practically vibrating on the page.

At least the blue is okay.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: