many/most datascience processes end up slowing down when inevitably the data must move back to python or a python function must be invoked on some data.
A significant performance improvement in python would benefit many ds related tasks.
This is very true, especially when pre-processing text and other unstructured data. It ends up being a lot of loops, string manipulation, and dict lookups.
Fortunately, with a tool like DVC or even Make, you usually don't have to (or want to) put that code in the same script as the actual machine learning part. So you can theoretically run the former with PyPy and the latter with CPython, if you really need to maximize both.
A significant performance improvement in python would benefit many ds related tasks.