PyO3: Rust Bindings for the Python Interpreter

gukoff · on Jan 29, 2021

With PyO3, I built the library to parse datetimes 10x faster than `datetime.strptime` in just a few lines of code: https://github.com/gukoff/dtparse

It just calls the Rust's chrono library that does the parsing and wraps the result in a Python object. You can do it for any Rust library, it's very, very easy!

The only slightly complicated part is the distribution. You need to use https://github.com/PyO3/maturin or https://github.com/PyO3/setuptools-rust, and of course, you need to have Rust installed on the wheel-building machine.

Feel free to use this repo as a reference if you want to build a similar thing. The code is commented, and there's a working GitHub action that builds the wheels for all platforms and uploads them to PyPi: https://github.com/gukoff/dtparse/tree/master/.github/workfl...

japhyr · on Jan 29, 2021

I was surprised to find out how slow strptime() can be. I was working on a data-focused project that was finally starting to slow down from the growing volume of data. I was looking at river heights over time, and once I hit about 140,000 data points the project got slow enough to make some profiling and optimization worthwhile. I was quite surprised to find it was spending more than two full seconds just running strptime(), out of a total execution time of around 15 seconds.

I ended up looking at a bunch of different ways of processing timestamps in Python: strptime(), string parsing, regex, datetime.isoformat(), NumPy, Pandas, and more. I got a 46x speedup using datetime.isoformat(). Other approaches got anywhere from 4x to 40x speedup, and a couple approaches were an order of magnitude slower than strptime().

My takeaway was there's no substitute for profiling the actual code you're running, and focusing on the specific bottlenecks in your own project. I wrote this up in a blog post if anyone's interested, "What's faster than strptime()?"

https://ehmatthes.com/blog/faster_than_strptime/

Rotareti · on Jan 29, 2021

This is awesome, thanks for sharing! I think this should be added to the PyO3 examples list :)

https://github.com/PyO3/pyo3#examples

mrcarruthers · on Jan 29, 2021

how does it compare against ciso8601 perf-wise? https://pypi.org/project/ciso8601/

to be fair ciso8601 only parses iso8601 datetimes, but that's enough for 90%+ of my use cases.

gukoff · on Jan 29, 2021

ciso8601 is blazingly fast, and also its wall time is very stable. By all means, use ciso8601 if the format allows :)

On my machine, ciso8601 always runs in 240ns, and the Rust lib median time is 1250ns.

You can run a benchcmark too! Just call pytest, and it will generate an .svg report: https://github.com/gukoff/dtparse/blob/master/tests/test_per... (you'll need to pip install ciso8601 pytest pytest-benchmark[histogram])

throwaway894345 · on Jan 29, 2021

I'm very curious to hear the use case for which date time parsing was the bottleneck! Also, I'm surprised that the overhead of calling across the language boundary didn't dwarf the gains from parsing...

gukoff · on Jan 29, 2021

One of the components in our project was churning through thousands of JSONs per second - deserializing, transforming and serializing them.

These JSONs represented the flight information. They included multiple datetimes, such as the scheduled departure/arrival time and the real departure/arrival time of a flight.

The first bottleneck was JSON deserializarion/serializarion. At that time we solved it with ujson, and now there's the even more performant orjson.

The second bottleneck happened to be datetime deserializarion. And we solved it with ciso8601 - luckily, these datetimes were in ISO8601. But this bottleneck later repeatedly occured in the other components and became an inspiration to write dtparse :)

sillysaurusx · on Jan 30, 2021

Wow, orjson is amazing. It even serializes numpy arrays. Thanks!

delduca · on Jan 30, 2021

`pysimdjson` is even better!

oblvious-earth · on Jan 29, 2021

I've had this situation a few times. Most recently transforming large (1-50 GB) CSV files in to a format that can be digested by a proprietary bulk DB loader.

Because our problem was just about reformatting we ended up reading the CSVs in binary mode and using struct to extract the relevant values from the date time fields. But if we needed to do actual date logic something like this would perhaps be useful (but there other fast date time libraries out there, I've been a fan of pendulum for some tasks).

throwaway894345 · on Jan 29, 2021

That makes sense, but I have a hard time believing the approach of calling into a date time parser O(n) times is going to yield a significant performance gain no matter how much faster the parser is. However, I'm being downvoted, so perhaps I'm mistaken?

oblvious-earth · on Jan 29, 2021

Sometimes it's about optimizing wall time not algorithmic complexity.

If you have a batch SLA of 1 hour, and your currently spending 50-70 mins to complete the batch and 20 minutes of that time is spent date parsing and you can reduce it to 5 minutes that's an big win.

throwaway894345 · on Jan 29, 2021

No doubt, but if your date parsing saves you 1 second per date parsed but each call into the faster library costs 2 seconds, then your performance actually suffers. The only way around this is to make a batch call such that the overhead is O(1).

minitech · on Jan 29, 2021

I’m not going to install it to check, but when someone writes “Fast datetime parser for Python written in Rust. Parses 10x-15x faster than datetime.strptime.” it seems reasonable to assume that this is not the case.

throwaway894345 · on Jan 29, 2021

Depends on whether or not the parent is including the overhead in their statistic. Misinformation about microbenchmarks is hardly a rarity.

ahupp · on Jan 30, 2021

In a language like Java where you mostly spend time in the VM and only occasionally jump into native code, that might be true. But in python a huge part of the runtime is this kind of native call. So I would not expect that this approach adds any new overhead.

throwaway894345 · on Jan 30, 2021

Your conclusion might be right, but your reasoning is certainly wrong. Calling native functions in Python is often quite expensive because you need to marshal between PyObjects and the native types (probably allocating memory as well). This doesn’t “feel” so slow in Python because, well, everything in Python is slow. But you really start to notice it when you’re optimizing.

ahupp · on Jan 30, 2021

Of course "It depends", but in my experience that kind of thing is rare. Either you're passing in str and can just grab the char* out of the existing PyObject, or you have some more complicated thing that was wrapped once in a PyObject and doesn't need to be converted, etc. But sure, if you have some dict with a lot of data and need to convert it into an std::map you'll have a bad time.

lincolnq · on Jan 29, 2021

My instinct is that the overhead is small. You need to add a few C stack frames and do some string conversion on each call, maybe an allocation to store the result. It’s not going to be as quick as doing in pure Rust, but the python-to-native code layer can be pretty lightweight I think!

brundolf · on Jan 29, 2021

Maybe they did it in bulk? i.e. send all the strings over at once, parse them in a loop, send them back. Seems like that would reduce overhead

throwaway894345 · on Jan 29, 2021

Right, and that makes sense, but the context here is a date parsing library for Python--unless said library has a batch interface, I'm not sure how that would improve performance, but maybe I'm misestimating something.

brundolf · on Jan 29, 2021

Ah, I skimmed over the part where this is a library and not application-code

pbecotte · on Jan 29, 2021

I've certainly never been bottlenecked on date parsing :) However, many/most of the high performance python libraries are built in C code, and compiled down into something the python interpreter can use directly. There are lots of python bindings written in c++ to native c libraries as well, I know I have used ZeroMQ pretty recently. Rust is done the same way- the code is compiled down into objects that Python can use directly- its not like running a javascript interpreter in your code.

cdavid · on Jan 30, 2021

I have seen it in many cases, especially working on financial data. My most recent example was working with real time feeds of trades, which we used ML models on top of. Inference was based on accumulated volume per fixed amount of time (say 30 sec, 1 min), and the code doing this in real time was python.

I don't remember the numbers, but caching + using ciso8601 was essential to manage the peak load (maybe 50k trades per sec ?).

JPKab · on Jan 29, 2021

Thank you thank you thank you!

I was looking at PyO3 a few months ago, after discovering the orjson python (with rust inside) library and radically speeding up an auto-ML app for work.

I really enjoyed starting to learn Rust, but found the process to embed in Python to be rather intimidating. Looking forward to using your repo as a reference, and love the dtparse work you've done.

dmw_ng · on Jan 29, 2021

Another cheap trick if the time column is sequential is to split the string into date and time components, cache the date part and calculate the time part just with some multiplication

Major caveat is timezone handling, but this only applies in a subset of situations

quietbritishjim · on Jan 29, 2021

If you've got to that point of modifying the storage format then you might as well just use an integer (microseconds success the epoch) and be done with it. That seems cleaner than using a string (or two strings) anyway.

itamarst · on Jan 29, 2021

I've been playing with PyO3 for prototyping, and wrapped some Rust code to see if it's faster than Python. The experience was very much like using Boost Python (whcih these days has alternative with https://github.com/pybind/pybind11). It's _really_ easy to wrap code for Python, and it has nice APIs to ensure GIL is held. Being Rust, I'm much more confident I won't suffer from memory unsafety issues which my C++ at the time did.

Now I'm starting to use it as part of the Python memory profiler I'm working on (https://pythonspeed.com/fil), in this case to call in to the low-level Python C API which PyO3 includes bindings for in addition to its high-level API. This kind of usage is more like writing C, except with the benefit of having high-level APIs (for GIL holding, but also object conversion) available when I need it.

So basically you get safe, high-level, easy-to-use APIs, with fallback to low-level unsafe APIs if you need them.

Highly recommend trying it out.

brundolf · on Jan 29, 2021

What's the data-conversion overhead look like at the boundary? Which data structures can be passed back and forth without a full clone, etc?

itamarst · on Jan 29, 2021

There's definitely a conversion cost. For strings, Python apparently caches the UTF-8 encoded string, so if you _repeatedly_ transfer it to Rust I suspect (but haven't checked) that the cost is much lower.

In general I suspect it's the usual "NumPy arrays are fast, everything else you better be getting a sufficiently large boost from the low-level code to justify conversion".

For the thing I prototyped in Rust, it was wrapping the `ahocorasick` crate which was in fact faster than `pyahocorasick` which is written in C or Cython or something. Both have similar conversion costs, probably, so it came down to "for lots of data the Rust version was faster".

burntsushi · on Jan 29, 2021

Be sure to use auto configuration to get it to go even faster, depending on your use case: https://docs.rs/aho-corasick/0.7.15/aho_corasick/struct.AhoC...

Or just be sure to enable the DFA option if you can afford it. It looks like the Python library is just the standard NFA algorithm.

itamarst · on Jan 29, 2021

Yeah, I was using DFA.

Next step is trying alternative approach, but if that alternative doesn't work I'm going to see about wrapping your package for Python.

Thanks for all your work on it!

burntsushi · on Jan 29, 2021

Nice! Reach out if there are any problems or if you need something exposed in the API. Looking at the pyahocorasick issue tracker, there are a number of features/bugs that your wrapper package would resolve. :)

liuliu · on Jan 29, 2021

NumPy also support conversions without copying. One thing I haven't found good way to bridge between Python is the pandas.DataFrame, it seems to be quite Python focused object and iterating through DataFrame is particularly slow.

itamarst · on Jan 29, 2021

Internally Pandas often uses NumPy arrays, especially for numeric data, so might be able to pass things that way in some cases?

E.g. `df["column_name"].values` will you get you a NumPy array.

shirakawasuna · on Jan 29, 2021

Sounds great! Would so much rather drop into Rust than C or C++.

JPKab · on Jan 29, 2021

Was just checking out your fil project. It looks really useful, and I dig the jupyter kernel as well.

itamarst · on Jan 29, 2021

Thank you! If you have any questions/problems/ideas, please reach out via GitHub or email (itamar@pythonspeed.com).

dbrgn · on Jan 29, 2021

If you're interested in publishing Rust libraries as Python packages (or integrating Rust code into an existing Python package), check out https://github.com/PyO3/maturin and https://github.com/PyO3/setuptools-rust.

edenhyacinth · on Jan 29, 2021

Been using Maturin for a little while professionally, and it's surprisingly good. There's a few bugbears here and there - I haven't found a way to have Cargo Test & a pyo3 library working at the same time - but overall it's a lot more pleasant than working with Rust and R was.

ksm1717 · on Jan 29, 2021

Between pyodide, pyo3, rust-cpython, and rustpython, I think Pyo3 is the best way to drop in rust in a python project for a speed up, if that is your goal. Some of the demos show using python from rust, but to me the biggest feature is without a doubt compiling rust code to native python modules. I'm using it to speed up image manipulation backed by numpy arrays.

There’s a setuptools rust [0] extension package that can be used to hook the compilation of the rust into the wheel building or install from source. Maturin [1] seems to be regarded as the new and improved solution for this, but I found that it’s angled toward the using python from rust.

There’s also the rust numpy [2] package by the same org which is fantastic in that it lets you pass a numpy matrix to a native method written in rust and convert it to the rust equivalent data structure, perform whatever transformation you want (in parallel using rayon [3]), and return the array. When building for release, I was seeing speed ups of 100x over numpy on the most matrix mathable function imaginable, and numpy is no joke.

I think there is a lot of potential for these two ecosystems together. If there’s not a python package for something, there’s probably a rust crate.

If anyone is interested the python package that I'm building with some rust backend, its called pyrogis [4] for making custom image manipulations through numpy arrays.

[0] https://github.com/PyO3/setuptools-rust

[1] https://github.com/PyO3/maturin

[2] https://github.com/PyO3/rust-numpy

[3] https://github.com/rayon-rs/rayon

[4] https://github.com/pierogis/pierogis

cycomanic · on Jan 29, 2021

> Between pyodide, pyo3, rust-cpython, and rustpython, I think Pyo3 is the best way to drop in rust in a python project for a speed up, if that is your goal. Some of the demos show using python from rust, but to me the biggest feature is without a doubt compiling rust code to native python modules. I'm using it to speed up image manipulation backed by numpy arrays.

> There’s a setuptools rust [0] extension package that can be used to hook the compilation of the rust into the wheel building or install from source. Maturin [1] seems to be regarded as the new and improved solution for this, but I found that it’s angled toward the using python from rust.

> There’s also the rust numpy [2] package by the same org which is fantastic in that it lets you pass a numpy matrix to a native method written in rust and convert it to the rust equivalent data structure, perform whatever transformation you want (in parallel using rayon [3]), and return the array. When building for release, I was seeing speed ups of 100x over numpy on the most matrix mathable function imaginable, and numpy is no joke.

What sort of algorithm was that? Generally getting 100x speedup on vectorized code is highly unusual even using handcoded c++. So I suspect it was quite loop heavy? In those cases I have also seen very significant speed ups.

I have been using pythran [1] for speeding up my python code. It generally achieves extremely good performance. I have blogged about it here [2] and recently a member used pythran to speed up some nbody benchmarks [3] which was used in an article to argue for using compiled languages.

That said I find pyO3 quite exciting and have been contemplating to try it with some of my projects. [1] https://github.com/serge-sans-paille/pythran [2] https://jochenschroeder.com/blog/articles/DSP_with_Python2/ [3] https://github.com/paugier/nbabel

ksm1717 · on Jan 29, 2021

Matrix of shape (rows, columns, 3). Average the last dim for each point and change it to [0,0,0] if average less than a value, [255,255,255] if greater. A brightness threshold. May be remembering the speed up factor wrong so take it with a grain of salt - fact of the matter is it was very impressive.

I’m checking out that post later, I’m trying to make my package easy to build on, so being able to write extensions with Pythran would be another great option for speed ups. Thanks

cycomanic · on Jan 29, 2021

Just for the fun of it I tested what speed up I could get with a naive algorithm and pythran. Based on your description it looks like the I should do the following:

def threshold_pixel(img, thr): out = np.zeros_like(img) o = np.mean(img, axis=-1) out[o>thr] = 255 return out

This runs in ~30ms for a (1024,1024,3) array using numpy on my machine. Using pythran (note I had to explicitely write out the loop for out[o>thr] =255, due to a bug, that I found and just reported), I get a speed of 6.ms (with openmp) and 9ms without (I did not tune the openmp, but this should yield a much higher speedup).

P.S.: Just had a look at your project, very cool, I have to try that

adkadskhj · on Jan 29, 2021

I needed Blender integration a while back and wasn't sure what i could write it in. Py03 worked great with Blender with no configuration. I was quite concerned that something about the Python-embedded-Blender behavior would limit Py03.. but nope, so far it's worked flawlessly.

Thanks Py03 team :)

mynameisash · on Jan 29, 2021

At work, I'm using PyO3 for a project that churns through a lot of data (step 1) and does some pattern mining (step 2). This is the second generation of the project and is on-demand compared with the large, batch project in Spark that it is replacing. The Rust+Python project has really good performance, and using Rust for the core logic is such a joy compared with Scala or Python that a lot of other pieces are written in.

Learning PyO3, I cobbled together a sample project[0] to demonstrate how some functionality works. It's a little outdated (uses PyO3 0.11.0 compared with the current 0.13.1) and doesn't show everything, but I think it's reasonably clear.

One thing I noticed is that passing very large data from Rust and into Python's memory space is a bit of a challenge. I haven't quite grokked who owns what when and how memory gets correctly dropped, but I think the issues I've had are with the amount of RAM used at any moment and not with any memory leaks.

[0] https://github.com/aeshirey/CheeseShop

adsharma · on Jan 29, 2021

There is another way to speed up python:

Write code in python and transpile to another language (could be rust) and then import it back into python

https://github.com/adsharma/py2many/tree/main/tests/expected

Figuring out a mapping between a subset of a compiled language and a subset of statically typed python should be possible.

The hard part is mapping standard library. I suspect something like nim might have an advantage there.

benecollyridam · on Jan 29, 2021

Another related project: Wasmtime and Rust+Python

Compile your Rust code to wasm to circumvent having to compile for different architectures.

https://docs.wasmtime.dev/wasm-rust.html

minimaxir · on Jan 29, 2021

Huggingface Tokenizers (https://github.com/huggingface/tokenizers), which are now used by default in their Transformers Python library, use pyO3 and became popular due to the pitch that it encoded text an order of magnitude faster with zero config changes.

It lives up to that claim. (I had issues with return object typing when going between Python/Rust at first but those are more consistent now)

mleonhard · on Jan 29, 2021

I'm interested in running Python inside wasmtime. I think PyO3 would be useful. We could build a small Rust wasm binary that exports an "execute_python_script" function. This would finally be a way to run Python in a strong sandbox with memory [0] and CPU [1] restrictions. (In 1999, I asked Guido for sandboxing support in Python, but he refused.)

[0] https://github.com/bytecodealliance/wasmtime/issues/2273

[1] https://github.com/bytecodealliance/wasmtime/issues/2274

pansa2 · on Jan 29, 2021

Related: RustPython - A Python interpreter written in Rust.

https://github.com/RustPython/RustPython

edeion · on Jan 29, 2021

That's a really great name you came up with! Embodies both parts of your focus, stays pronounceable. Does the 3 relate to the Python version or are you mimicking some specific molecule that I can't think of?

smlckz · on Jan 29, 2021

Py (iv)

              O
     O = Py < |
              O

or Py (vi)

or Py (ii)

         O
    Py <   > O
         O

heh!

auscompgeek · on Jan 29, 2021

I think you might be missing an oxygen atom there.

Swenrekcah · on Jan 29, 2021

I would guess it is derived from: https://en.wikipedia.org/wiki/Iron(III)_oxide

OskarS · on Jan 29, 2021

I thought it was like the compiler flag, -O3. ”With full optimization”, basically.

smlckz · on Jan 29, 2021

But that's Fe_2 O_3 !

ziml77 · on Jan 29, 2021

I think calling it Py2O3 would be a bit confusing though.

smlckz · on Jan 29, 2021

Just PyO or Py_3 O_4 could have been used as well, does not matter that much.

batterylow · on Jan 29, 2021

It's indeed a cool name, but it's not my doing (this isn't a Show HN)!

SnowflakeOnIce · on Jan 29, 2021

My guess is that the name is derived from the `-O3` compiler optimization level from many compilers.

fafhrd91 · on Jan 29, 2021

name was chosen after `uranium trioxide`, pythonium trioxied - pyo3

chc · on Jan 29, 2021

If you're trying to figure out the origin of a Rust project's name, the safest bet is always to choose the one that's a reference to metal.

fafhrd91 · on Jan 29, 2021

i am original author of pyo3. Yuri Selivanov (author of uvloop and edgedb) suggested pyo3 name.

chc · on Jan 29, 2021

Oh, I know, I wasn't trying to correct you or anything. I was just adding on to the correct answer to point out that PyO3's naming scheme is part of a popular trend in Rust libraries.

fulafel · on Jan 29, 2021

Previously (2017): https://news.ycombinator.com/item?id=14859844

LockAndLol · on Jan 29, 2021

If this works well, I'd rather use this over being forced to use type hints and mypy.

Has anybody used this in conjunction with a python framework? Django, fastapi or something?

edenhyacinth · on Jan 29, 2021

I have! Used FastAPI as a frontend to do some minor data modification, and passed the data for model inference in Rust.

Works really nicely, although given how little work I'm doing in the Python side I honestly prefer using Rocket instead of FastAPI and then using pyo3 to call the Python library in Rust, rather than the other way around.

LockAndLol · on Jan 29, 2021

Thanks for the response. That does sound pretty much like what I would like to do. Have you by any chance open-sourced your project?

I'm new to rust, but I'll check out Rocket. Cheers

uranusjr · on Jan 29, 2021

Uh, how do you plan to use FastAPI while avoiding type hints?

pansa2 · on Jan 29, 2021

How would PyO3 help you avoid type hints and mypy?

brundolf · on Jan 29, 2021

I think the idea is that they move their business logic to the Rust code, since Rust's type system is more powerful and more sound, instead of trying to make do with MyPy

zerkten · on Jan 29, 2021

Wouldn't it be more of a priority to move it for lower memory use and higher request speed? A better type system is good, but often these are a struggle with scaling interpreted languages compared to other lower level languages.

brundolf · on Jan 29, 2021

For many people the primary appeal of Rust is its type system and related features (declaring deep immutability, pattern-matching, etc)

> often these are a struggle with scaling interpreted languages compared to other lower level languages

Not sure what's meant by this

LockAndLol · on Jan 29, 2021

It would minimize the python surface required to be covered with type-hints and mypy. If possible, one could simply point django to the modules generated from rust.

I'll give it a shot tonight and see how it goes. Now I'm curious.

bluedays · on Jan 29, 2021

Without looking at it I wonder if it's using the Python language underneath, or the python vm. Either way this is pretty cool.

Nvorzula · on Jan 29, 2021

Precisely, this is Rust that compiles to a C FFI that plugs into CPython.