many/most datascience processes end up slowing down when inevitably the data mus...

Filligree · on Nov 3, 2021

Very much this. Anyone who does machine learning will notice their CPU sitting at 100% of one core a significant fraction of the time.

Doesn't matter how fast a GPU you have; Python and the GIL is the bottleneck.

nerdponx · on Nov 5, 2021

This is very true, especially when pre-processing text and other unstructured data. It ends up being a lot of loops, string manipulation, and dict lookups.

Fortunately, with a tool like DVC or even Make, you usually don't have to (or want to) put that code in the same script as the actual machine learning part. So you can theoretically run the former with PyPy and the latter with CPython, if you really need to maximize both.