Hacker News new | past | comments | ask | show | jobs | submit login
[flagged] Python interpreter written in rust reaches 10000 commits (github.com/rustpython)
194 points by andrew-ld on Nov 3, 2021 | hide | past | favorite | 93 comments



Hacker News Guidelines

>please use the original title, unless it is misleading or linkbait; don't editorialize.


What's the original title here?


RustPython - A Python-3 (CPython >= 3.9.0) Interpreter written in Rust


In other news, graal python has gotten to ~13k commits. Is there some significant meaning to this from a real compatibility and user experience standpoint?


What else is cool about it besides being written in rust?


They're working on JIT, which could really improve on PyPy (not that PyPy isn't great). It also supports WASM as a first class platform (as much as you can without standard library modules that interact with the OS). I also just like seeing another non-CPython implementation.

Not game changing for the Python world, but I think it's pretty neat.


Interesting that we now have four performance-focused alternative implementations of Python: PyPy, GraalPython, Pyston, and now RustPython.

There are also Jython and IronPython, although I don't know how successful/popular or actively-maintained those implementations are.

Someone recently pointed out on IRC that Python is transitioning away from a language with "one implementation, and the implementation is the spec" to "one preferred implementation with an informal spec, but many alternative implementations".

Now that PyPy v7.3.7 seems to support 3.8 without any issues that I've encountered, I'd strongly consider evaluating it for use in production code. It might be interesting to run some async webserver + data processing benchmarks or something across the various implementations. Maybe a good benchmark would be measuring total throughput on something like "process/sanitize some user-provided text file, then serve a prediction from a PyTorch model".


> Someone recently pointed out on IRC that Python is transitioning away from a language with "one implementation, and the implementation is the spec" to "one preferred implementation with an informal spec, but many alternative implementations".

I don't think this is a new thing at all. When I first heard of Python a decade and a half ago, the "pitch" in the official docs back then was that "Python" was a programming language while "CPython" was the reference implementation of it. And the docs were (and still are) quite careful to point out which things are implementation details of CPython and are not to be relied upon if portability to other implementations or future versions of CPython are desirable features.

I don't know what you mean about an "informal spec", the the full formal specification of Python (the language) is here: https://docs.python.org/3/reference/index.html


>I don't know what you mean about an "informal spec", the the full formal specification of Python (the language)

I think GP meant "informal" in the sense that python wasn't originally meant to be a multi-implementation language. Though the oldest alternative implementation goes back to 1997, alternatives were always, until relativel recently, second class. I don't have the tools or the time to quantify what I mean by "recently" or "second class", but I hope you know what I mean.

The single biggest marker of being "second class" that I do know of is how the new features always gets introduced first by Cpython, then every other implementation plays catch up. This is unheard of in true multi-implementation languages, what I have in mind is C, C++, Java and Javascript. I'm sure there are plenty more, but those just, off the top of my head, are the most prominent languages whose features are introduced first in completely implementation-agnostic way, then all implementations start racing to get it complete.

I don't mean to say that Cpython maintainers just wake up in the morning and decide to add new syntax or semantics to the language, PEP documents are quite formal and implementation-agnostic, but I always had the impression it's something by Cpython devs for Cpython devs, and supporting that is how a Cpython implementation always appear first and other implementations lag behind by a varying amounts.


Numerical computation libraries: numpy, scipy, ML libraries like sklearn, pytorch, and bindings for tensorflow, xgboost and many more must all work without any hiccup before data science and AI teams will consider switching away from CPython.


Those teams still wouldn't switch because they wouldn't gain any performance from switching. After all CPython is just calling highly optimized, already compiled C, Fortran or Accelerator specific (CUDA, ROCm, TPU) code as far as they are concerned.


many/most datascience processes end up slowing down when inevitably the data must move back to python or a python function must be invoked on some data.

A significant performance improvement in python would benefit many ds related tasks.


Very much this. Anyone who does machine learning will notice their CPU sitting at 100% of one core a significant fraction of the time.

Doesn't matter how fast a GPU you have; Python and the GIL is the bottleneck.


This is very true, especially when pre-processing text and other unstructured data. It ends up being a lot of loops, string manipulation, and dict lookups.

Fortunately, with a tool like DVC or even Make, you usually don't have to (or want to) put that code in the same script as the actual machine learning part. So you can theoretically run the former with PyPy and the latter with CPython, if you really need to maximize both.


I would, I find that there are lots of data transformations and non-deep modelling happening in python still. E.g string processing, Json munching, business rules if-this-then-remove, etc.


Having spent a lot of time on data science teams, rewriting hot sections of text processing code in Cython to obtain acceptable performance, I can tell you that I would have gladly switched away from CPython specifically for those tasks. If you're using Conda, it's almost trivial to have a PyPy environment alongside a CPython environment in the same project. You run the data processing scripts/notebooks with the former and the machine learning stuff with the latter.

But my post was more oriented towards non-data-science uses of Python, like writing an API server or a web crawler or a TUI application. I think the "serve a prediction from a PyTorch model" part threw off the conversation a bit!


> Someone recently pointed out on IRC that Python is transitioning away from a language with "one implementation, and the implementation is the spec" to "one preferred implementation with an informal spec, but many alternative implementations".

It's been "transitioning" for as long as it existed. The alternative forks all eventually die.


There's also Skybison and Cinder, but I'm biased since I worked on both.


I should have remembered Skybison/Cinder, it was posted on here not long ago.

Heck, I probably should have mentioned Stackless too, which apparently is still being actively developed.


Recently tried pypy3, it's not a drop in replacement. Could only get numpy to install on a venv, and the little tool I was working on was about 5 times slower with pypy compared to cpython (which may be due to using numpy..)


Numpy and other C extensions that heavily use the CPython API (as opposed to the CFFI) are known to be slower under PyPy. That's probably not ever going to be a strong area for it.


There's also Cython "The most widely used Python to C compiler"

https://github.com/cython/cython


I think Cython is kind of a different category, but I have found that pretty much any Python program gets a comfortable 50% speedup simply by compiling it with Cython.

Nuitka probably is worth mentioning along these lines too.


Also MicroPython, but that implements just a subset of the Python language.


> They're working on JIT, which could really improve on PyPy

Have they got a big new compiler idea?


https://github.com/RustPython/RustPython/tree/main/jit

Pretty sparse. Uses cranelift tho, which is what Firefox is using for wasm

For comparison, here's a complete JIT-for-befunge using cranelift: https://github.com/serprex/Befunge/blob/master/barfs/src/jit...

Rust is seeing ~5x improvement in compile time using cranelift for debug builds instead of llvm. Cranelift is much better suited as a jit library than llvm is. Bit of a pain postgres went the llvm route


> Cranelift is much better suited as a jit library than llvm is.

Unladen Swallow is seared in my mind. It was a failed project, back in 2009, to improve the speed of CPython by using LLVM as a JIT.


Was it really LLVM that was the problem, or the fact that there’s a lot more to making a dynamic language fast than compiling individual methods?


Sure, making a dynamic language fast is not easy. However, back then people thought LLVM was a magic wand. It wasn't. There were several problems.

First, the Unladen Swallow team (IIRC) spent a lot of time fixing bugs in LLVM.

Second, LLVM isn't fast at compiling code, at least not for a JIT. This is legitimately surprising, because the official LLVM Tutorial implements a JIT.

Third, LLVM used to stand for Low Level Virtual Machine. I don't know when it stopped standing for that; clearly it hasn't for a long time. But with “Virtual Machine” in the title, you can see why people might have thought it would be suitable for implementing a dynamic language. cf. GraalVM these days.


Exactly. LLVM and Cranelift are essentially code generation backends when you’re applying them for dynamic languages. You need an entire actual custom compiler in front of them to get good code out of them.


Sorry my limited understanding here, does having a JIT making having a REPL easier/more possible?

I have only developed in Python professionally, and when I play around in Go and Rust, I really miss the ability to sketch things out in an IPython session.


JIT can be considered an optimization. If you F12 in your browser you have a javascript REPL, no need to know about JIT or not

CPython doesn't have a JIT, it has an interpreter. So it spends a lot of time in this loop: https://github.com/python/cpython/blob/main/Python/ceval.c

But you still have a repl with pypy. The downsides of JIT is that compiling bytecode to assembly can take up time (hurting startup performance, but that can be mitigated by not applying jit aggressively) & some programs have very dynamic behavior which the JIT has to eventually give up on (& go back to interpreting) or run off some pathological performance cliff where it takes up a bunch of memory & runs 10x slower than interpreter

Ruby 3 introduced a jit to their reference implementation


JIT will have no effect on "user" experience, it's all about performance. (Compiles the parts that run often.)


It has 10000 commits in its git history!


I want to ask this question about virtually every "written in Rust" project that appears on HN.


Are you asking how link aggregation sites work? Or are you just noting that you're surprised at the overlap between the HN community and the Rust community?


I think the parent question, "What else is cool about it besides being written in rust?", speaks for itself.

I understand that there are currently a large number (or at least a vocal number?) of HN readers with an interest in Rust. I have been here long enough to witness Lisp, Haskell, Node, Golang, Julia, and even .NET Core all go through similar cycles.

However, Rust is the ONLY one on that list for which "___ written in Rust" is a nearly automatic trip to the top of the front page. It's weird. It feels like astroturf at times, and even if it's legitimate good faith then it's still overbearing.

Just...... is there literally ANYTHING noteworthy about "_____" other than the programming language it was written in? I'm not sure how noteworthy that is, by itself, even if you have a strong interest in that language.


In this particular case, the fact it's written in rust gives it some very interesting properties:

- it makes compiling your entire project into a standalone executable trivial

- the interpreter can be compiled to WASM way more easily than with something like pyodide

- you can provision python with cargo, which mean no more fiddling with pyenv, deadsnake, epel, etc. and yet getting a consistent, to the minor version, python distribution

- rewritting you hot path in rust becomes first class citizen. Since the python story is, start with python, and when you need to scale, you can always create an extension later, this is really attractive.

All that has only value, of course, if the project reaches a good compat and is supported.

But still, the possibilities it offer are not negligible.


> You can compile RustPython to a standalone WebAssembly WASI module so it can run anywhere


Can you not do that with CPython or PyPy?


You certainly can, as pyodide proves it, but it's a complicated task. Pyodide is upgraded only once in a while to a upper python version because of how much work it is.

This makes it much easier.


Compiling the interpreter to webassembly and running python code in wasm sounds pretty cool.


Interesting that this is the top post now. This is exactly what I think for most golang/rust/etc. submissions, but any such discussion (if any) has always been far down the thread. What changed, I wonder.


The fact that it now has some replies that say something worth reading. On its own, whinging about "written in rust" is thoroughly tedious and downvote-worthy.


Yes, I think that's the difference here: this one is asking a legitimate question, and so is generating discussion. The others are almost always just complaining that the word "Rust" is in the title, which does not invite discussion and is not interesting.


This question should be integrated into HN’s submit form.


No more fucking GIL bullshits


It doesn't have a GIL because it's using a lot of smaller individual locks. That ends up being much slower in practice.


Are you sure, this does not have the GIL? I tried looking into documentation, I could not find such info. Can you please point the link?


It does not have a GIL but this also means it's not compatible with a lot of Python code that needs a GIL or C-extensions. Not having a GIL by itself is not tricky, being compatible and not having a GIL is.


I believe Python C-API compatibility isn't a really a goal for this implementation anyway. [1] HPy sounds like the best path forward?

1. https://github.com/RustPython/RustPython/issues/1940


why can't they implement a 'fake' GIL flag for those programs? so its backward compatible. (Yeah its probably a lot more work to write those extra modules/code paths... !)


It's not a "flag", really: it's a fundamental, implicit underlying assumption about the execution environment. The C extensions assume (roughly) that if they're executing, they can touch any part of the runtime without any synchronization whatsoever, because nobody else is executing (that's what GIL ensures).


Or, to put it another way, it's the first letter of the acronym: "Global Interpreter Lock".

You can't really "fake" a global lock. Either you're globally locking everywhere with the global lock or you aren't.


It has a GIL for now.

But if you are interested on removing the GIL, you gonna love 2022: https://docs.google.com/document/u/0/d/18CXhDb1ygxg-YXNBJNzf...


Oh, that's a big deal, yes. (For those unaware, GIL = global interpreter lock.)


I don’t think it’s a big deal if it’s not backwards compatible with the billions of lines of existing Python which use C extensions.


It's most likely that the GIL will be removed in the next 5 years: https://docs.google.com/document/u/0/d/18CXhDb1ygxg-YXNBJNzf...


How about a Rust interpreter written in Python? Now that would be something worthy of a top HN post!


If you thought compile times were bad before…


Then that would be the perfect solution for you! Interpreted Rust has no compilation step, you can jump straight to execution!


Python has a compilation step, you can't jump straight to execution in Python.

Interpreted != not compiled.


This is irrelevant for the Rust interpreter written in Python. The interpreter will likely already be precompiled into bytecode when you run it on a Rust program.


Technically speaking, Rust is a compiled language, which means an interpreter won't do the job.


You can interpret its MIR (mid-level intermediate representation) like Miri or const evaluation does


Didn't know, thanks!


At a glance I didn't find a percentage of compatibility with c python. Is it tracked somewhere?


Yeah, this is a really weird milestone. A policy of eager commits, never squashing, and extreme indecision could easily net 1k commits per developer day. Not saying that's what's happening here or anything... but this is a really bad metric for progress.

But I find it noteworthy (inasmuch as I'm writing this comment) because the phenomenon is real. I've heard first-hand horror stories of shops that use commits per day as a serious metric, and the result is horrible.


I can accept that someone involved in the project is happy with the number of commits and though I don't care much about it, I don't mind them being happy either. I'm actually, truly, and only interested to know how usable this project is.


Being able to embed this inside a rust program for scripting/the ability to tweak behavior at runtime without recompiling could be really cool. Although my dream is still that someone makes an interpreted subset of Rust itself.


Rune is pretty close, although it is dynamic (https://github.com/rune-rs/rune), Mun seems to be getting in that direction as well (and has a static type system!) but has fewer features (https://github.com/mun-lang/mun)


Not for Rust itself, but there's an interpreter for Rust's MIR: https://github.com/rust-lang/miri


Did they get rid of the GIL?


Getting rid of the GIL does not imply better performance... Currently, their implementation is quite slow, part of the reason is due to extensive use of atomic integers and RC usage.

https://github.com/RustPython/RustPython/issues/2445

https://github.com/RustPython/RustPython/issues/2474


The presence of any kind of GIL is probably my latest "will not use this language" filter, it's why I originally switched to Elixir actually (it does not require a GIL because all memory is immutable and concurrency is trivial in that case)


apparently they did


Getting rid of the GIL isn’t the hard part.

It’s getting rid of the GIL and having existing Python code continue to work, with C extensions and all.


Where is this documented


There's some info on the proposal here: https://news.ycombinator.com/item?id=28896367


I think the root comment is asking whether this new Python-on-Rust interpreter for rid of the GIL. Your link is about the existing CPython interpreter.


Why document something they have not implemented?


Because rightly or wrongly, tons of existing Python code implicitly requires on the GIL to function correctly. Especially when you have C extensions that may not be thread-safe.


Wait, it's not called Rython?


Copperhead would make more sense, since it's a metal that rusts.


Copper doesn’t rust as it doesn’t contain any Iron.


Oh right, I was thinking of the green patina bronze got. Copper gets that too after many years in oxygen.


By definition only iron rusts, but that name is gone.


Mmm, to rust can have a specific meaning, especially in chemistry, of iron oxidization. Nonetheless, there is also a more general meaning of corrosion of other metals.

(It's entirely normal for words to have both specific and general meanings.)


Pyrite then?


Scale


I too am disappointed by the unimaginative naming.


I wonder how easy it is to customize/adapt the grammar. The possibility of using this as an embedded scripting language is amazing, specially if you can easily expand stuff to your domain.


I love its logo!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: