Your Python program maxing out 32 cores was x10 slower than a single threaded Ru...

mort96 · 2024-03-18T00:12:21 1710720741

It's slightly exaggerated; the Python program might not have been able to fully utilize all cores, it's really just 16 cores with hyperthreading. But it's not unreasonable: 150x speed-up isn't unexpected when going from Python to C/Rust/C++ in number crunching code, 150/16=9.3 (16 is based on the assumption that the gains from hyperthreading and the losses from imperfect parallelism more or less cancel out)

I don't think I have the code for these large-ish data processing experiments I did any more, but it would be fun to make some toy problems with large amounts of data and create comparable Python and C implementations and create a blog post with the results.

pulvinar · 2024-03-18T01:12:15 1710724335

Can we assume that you weren't able to use Numpy here, or at least that your inner loops weren't using it? It can be faster than C++ when you don't happen to know all the optimizations the Numpy library writers knew.

mort96 · 2024-03-18T07:38:26 1710747506

Yeah, I'm just talking about normal Python code here. If you're able to express your problem such that numpy or scipy or pytorch or NLTK or some other C/Fortran library does all the number crunching, Python's performance is less of an issue