In other news, graal python has gotten to ~13k commits. Is there some significant meaning to this from a real compatibility and user experience standpoint?
They're working on JIT, which could really improve on PyPy (not that PyPy isn't great). It also supports WASM as a first class platform (as much as you can without standard library modules that interact with the OS). I also just like seeing another non-CPython implementation.
Not game changing for the Python world, but I think it's pretty neat.
Interesting that we now have four performance-focused alternative implementations of Python: PyPy, GraalPython, Pyston, and now RustPython.
There are also Jython and IronPython, although I don't know how successful/popular or actively-maintained those implementations are.
Someone recently pointed out on IRC that Python is transitioning away from a language with "one implementation, and the implementation is the spec" to "one preferred implementation with an informal spec, but many alternative implementations".
Now that PyPy v7.3.7 seems to support 3.8 without any issues that I've encountered, I'd strongly consider evaluating it for use in production code. It might be interesting to run some async webserver + data processing benchmarks or something across the various implementations. Maybe a good benchmark would be measuring total throughput on something like "process/sanitize some user-provided text file, then serve a prediction from a PyTorch model".
> Someone recently pointed out on IRC that Python is transitioning away from a language with "one implementation, and the implementation is the spec" to "one preferred implementation with an informal spec, but many alternative implementations".
I don't think this is a new thing at all. When I first heard of Python a decade and a half ago, the "pitch" in the official docs back then was that "Python" was a programming language while "CPython" was the reference implementation of it. And the docs were (and still are) quite careful to point out which things are implementation details of CPython and are not to be relied upon if portability to other implementations or future versions of CPython are desirable features.
>I don't know what you mean about an "informal spec", the the full formal specification of Python (the language)
I think GP meant "informal" in the sense that python wasn't originally meant to be a multi-implementation language. Though the oldest alternative implementation goes back to 1997, alternatives were always, until relativel recently, second class. I don't have the tools or the time to quantify what I mean by "recently" or "second class", but I hope you know what I mean.
The single biggest marker of being "second class" that I do know of is how the new features always gets introduced first by Cpython, then every other implementation plays catch up. This is unheard of in true multi-implementation languages, what I have in mind is C, C++, Java and Javascript. I'm sure there are plenty more, but those just, off the top of my head, are the most prominent languages whose features are introduced first in completely implementation-agnostic way, then all implementations start racing to get it complete.
I don't mean to say that Cpython maintainers just wake up in the morning and decide to add new syntax or semantics to the language, PEP documents are quite formal and implementation-agnostic, but I always had the impression it's something by Cpython devs for Cpython devs, and supporting that is how a Cpython implementation always appear first and other implementations lag behind by a varying amounts.
Numerical computation libraries: numpy, scipy, ML libraries like sklearn, pytorch, and bindings for tensorflow, xgboost and many more must all work without any hiccup before data science and AI teams will consider switching away from CPython.
Those teams still wouldn't switch because they wouldn't gain any performance from switching. After all CPython is just calling highly optimized, already compiled C, Fortran or Accelerator specific (CUDA, ROCm, TPU) code as far as they are concerned.
many/most datascience processes end up slowing down when inevitably the data must move back to python or a python function must be invoked on some data.
A significant performance improvement in python would benefit many ds related tasks.
This is very true, especially when pre-processing text and other unstructured data. It ends up being a lot of loops, string manipulation, and dict lookups.
Fortunately, with a tool like DVC or even Make, you usually don't have to (or want to) put that code in the same script as the actual machine learning part. So you can theoretically run the former with PyPy and the latter with CPython, if you really need to maximize both.
I would, I find that there are lots of data transformations and non-deep modelling happening in python still. E.g string processing, Json munching, business rules if-this-then-remove, etc.
Having spent a lot of time on data science teams, rewriting hot sections of text processing code in Cython to obtain acceptable performance, I can tell you that I would have gladly switched away from CPython specifically for those tasks. If you're using Conda, it's almost trivial to have a PyPy environment alongside a CPython environment in the same project. You run the data processing scripts/notebooks with the former and the machine learning stuff with the latter.
But my post was more oriented towards non-data-science uses of Python, like writing an API server or a web crawler or a TUI application. I think the "serve a prediction from a PyTorch model" part threw off the conversation a bit!
> Someone recently pointed out on IRC that Python is transitioning away from a language with "one implementation, and the implementation is the spec" to "one preferred implementation with an informal spec, but many alternative implementations".
It's been "transitioning" for as long as it existed. The alternative forks all eventually die.
Recently tried pypy3, it's not a drop in replacement. Could only get numpy to install on a venv, and the little tool I was working on was about 5 times slower with pypy compared to cpython (which may be due to using numpy..)
Numpy and other C extensions that heavily use the CPython API (as opposed to the CFFI) are known to be slower under PyPy. That's probably not ever going to be a strong area for it.
I think Cython is kind of a different category, but I have found that pretty much any Python program gets a comfortable 50% speedup simply by compiling it with Cython.
Nuitka probably is worth mentioning along these lines too.
Rust is seeing ~5x improvement in compile time using cranelift for debug builds instead of llvm. Cranelift is much better suited as a jit library than llvm is. Bit of a pain postgres went the llvm route
Sure, making a dynamic language fast is not easy. However, back then people thought LLVM was a magic wand. It wasn't. There were several problems.
First, the Unladen Swallow team (IIRC) spent a lot of time fixing bugs in LLVM.
Second, LLVM isn't fast at compiling code, at least not for a JIT. This is legitimately surprising, because the official LLVM Tutorial implements a JIT.
Third, LLVM used to stand for Low Level Virtual Machine. I don't know when it stopped standing for that; clearly it hasn't for a long time. But with “Virtual Machine” in the title, you can see why people might have thought it would be suitable for implementing a dynamic language. cf. GraalVM these days.
Exactly. LLVM and Cranelift are essentially code generation backends when you’re applying them for dynamic languages. You need an entire actual custom compiler in front of them to get good code out of them.
Sorry my limited understanding here, does having a JIT making having a REPL easier/more possible?
I have only developed in Python professionally, and when I play around in Go and Rust, I really miss the ability to sketch things out in an IPython session.
But you still have a repl with pypy. The downsides of JIT is that compiling bytecode to assembly can take up time (hurting startup performance, but that can be mitigated by not applying jit aggressively) & some programs have very dynamic behavior which the JIT has to eventually give up on (& go back to interpreting) or run off some pathological performance cliff where it takes up a bunch of memory & runs 10x slower than interpreter
Ruby 3 introduced a jit to their reference implementation
Are you asking how link aggregation sites work? Or are you just noting that you're surprised at the overlap between the HN community and the Rust community?
I think the parent question, "What else is cool about it besides being written in rust?", speaks for itself.
I understand that there are currently a large number (or at least a vocal number?) of HN readers with an interest in Rust. I have been here long enough to witness Lisp, Haskell, Node, Golang, Julia, and even .NET Core all go through similar cycles.
However, Rust is the ONLY one on that list for which "___ written in Rust" is a nearly automatic trip to the top of the front page. It's weird. It feels like astroturf at times, and even if it's legitimate good faith then it's still overbearing.
Just...... is there literally ANYTHING noteworthy about "_____" other than the programming language it was written in? I'm not sure how noteworthy that is, by itself, even if you have a strong interest in that language.
In this particular case, the fact it's written in rust gives it some very interesting properties:
- it makes compiling your entire project into a standalone executable trivial
- the interpreter can be compiled to WASM way more easily than with something like pyodide
- you can provision python with cargo, which mean no more fiddling with pyenv, deadsnake, epel, etc. and yet getting a consistent, to the minor version, python distribution
- rewritting you hot path in rust becomes first class citizen. Since the python story is, start with python, and when you need to scale, you can always create an extension later, this is really attractive.
All that has only value, of course, if the project reaches a good compat and is supported.
But still, the possibilities it offer are not negligible.
You certainly can, as pyodide proves it, but it's a complicated task. Pyodide is upgraded only once in a while to a upper python version because of how much work it is.
Interesting that this is the top post now. This is exactly what I think for most golang/rust/etc. submissions, but any such discussion (if any) has always been far down the thread. What changed, I wonder.
The fact that it now has some replies that say something worth reading. On its own, whinging about "written in rust" is thoroughly tedious and downvote-worthy.
Yes, I think that's the difference here: this one is asking a legitimate question, and so is generating discussion. The others are almost always just complaining that the word "Rust" is in the title, which does not invite discussion and is not interesting.
It does not have a GIL but this also means it's not compatible with a lot of Python code that needs a GIL or C-extensions. Not having a GIL by itself is not tricky, being compatible and not having a GIL is.
why can't they implement a 'fake' GIL flag for those programs? so its backward compatible. (Yeah its probably a lot more work to write those extra modules/code paths... !)
It's not a "flag", really: it's a fundamental, implicit underlying assumption about the execution environment. The C extensions assume (roughly) that if they're executing, they can touch any part of the runtime without any synchronization whatsoever, because nobody else is executing (that's what GIL ensures).
This is irrelevant for the Rust interpreter written in Python. The interpreter will likely already be precompiled into bytecode when you run it on a Rust program.
Yeah, this is a really weird milestone. A policy of eager commits, never squashing, and extreme indecision could easily net 1k commits per developer day. Not saying that's what's happening here or anything... but this is a really bad metric for progress.
But I find it noteworthy (inasmuch as I'm writing this comment) because the phenomenon is real. I've heard first-hand horror stories of shops that use commits per day as a serious metric, and the result is horrible.
I can accept that someone involved in the project is happy with the number of commits and though I don't care much about it, I don't mind them being happy either. I'm actually, truly, and only interested to know how usable this project is.
Being able to embed this inside a rust program for scripting/the ability to tweak behavior at runtime without recompiling could be really cool. Although my dream is still that someone makes an interpreted subset of Rust itself.
Getting rid of the GIL does not imply better performance... Currently, their implementation is quite slow, part of the reason is due to extensive use of atomic integers and RC usage.
The presence of any kind of GIL is probably my latest "will not use this language" filter, it's why I originally switched to Elixir actually (it does not require a GIL because all memory is immutable and concurrency is trivial in that case)
I think the root comment is asking whether this new Python-on-Rust interpreter for rid of the GIL. Your link is about the existing CPython interpreter.
Because rightly or wrongly, tons of existing Python code implicitly requires on the GIL to function correctly. Especially when you have C extensions that may not be thread-safe.
Mmm, to rust can have a specific meaning, especially in chemistry, of iron oxidization. Nonetheless, there is also a more general meaning of corrosion of other metals.
(It's entirely normal for words to have both specific and general meanings.)
I wonder how easy it is to customize/adapt the grammar. The possibility of using this as an embedded scripting language is amazing, specially if you can easily expand stuff to your domain.
>please use the original title, unless it is misleading or linkbait; don't editorialize.