Expensive lessons in Python performance tuning

zobzu · on July 22, 2012

when you need to inline C code, then you use ctypes. it seems to be a common idea to use "whatever is the most fancy latest funny lib" because it's necessarily better. (and some older well known libs which are useful but may also be rather slow)

Obviously, it's often not better, as the author mention.

In my experience with python, a lot of functions are directly mapped to C code and those are nearly as fast as well written C code. Some others are calling python code in between, and those are obviously much slower.

That's the main thing to know when you want speed. So if you start using fancy python classes and other crap on top, it's going to be slower and slower, and if it's code that's called a lot, it's going to hurt hard.

When sometimes, some code needs to be fast and there's no functions that call C code directly, ctypes work just fine.

Now there are some libraries which are well made (performance wise), but that's not the norm. It's actually pretty damn rare in my experience.

16s · on July 23, 2012

Boost Python is fabulous too if you want to call C++ routines. We do a lot of heavy lifting with that and use plain Python for higher-level tasks.

DanWaterworth · on July 22, 2012

I've used numpy a little and I enjoyed it and it performed ok, but is python really a good choice for writing this kind of code? I'm really genuinely interested.

iskander · on July 22, 2012

I've been doing data analysis in Python for the past 2ish years and I think it's a great choice. Before Python I worked in Matlab, which simplifies matrix operations at the expense of making everything else terrible.

The main other contenders here are R and Mathematica, both of which will fail you when you need do something that isn't strictly statistical/mathematical. Python gives you predictable decent performance and the NumPy ecosystem is awesome for numerical libraries. I've never come across a machine learning library nearly as well designed as scikit-learn and pandas dataframes are a lot snappier than R's equivalents. My only gripe is the paucity of good plotting libraries (matplotlib is impoverished and ugly compared with R's sexy plotting routines).

Now, I haven't said a word about the faster statically compiled languages: C, C++, Java, C#, F#, OCaml, Haskell, etc...

The trouble with static languages is that they either lack essential libraries or don't allow for rapid prototyping (or in some cases, both).

Now, if you're implementing the heart of a numerically intensive algorithm and your code can't be decomposed into a few already implemented primitives, it makes sense to write it in C. The first thing to do, though, is to wrap that native code with a Python interface and test it from python.

carterschonwald · on July 22, 2012

There will be very nice tools rolling out for Haskell early this fall. :-).

(I can't elaborate much because I'm busy writing theM presently, but stay tuned and I think y'all will like what you'll see when the public release lands)

(one sexy hint though: the value add of these works in progress is enough that I'll be able to hire folks full time to work on it with me starting mid September or October . )

carterschonwald · on July 22, 2012

Folks who are intrigued (whether as hypothetical users/customers, or as future colleagues / collaborators, shoot me an email!)

carterschonwald · on July 23, 2012

Loving the emails :-)

ogrisel · on July 22, 2012

Yes as long as you profile and make sure that the bottleneck is a blas / atlas call (when doing numpy vector operation) or a compiled extension of your own (e.g. a piece of Cython or c code) if you need a for loop that cannot be vectorized easily with numpy.

cython is very nice for numerical computation because it offers a nice syntax for dealing with array operations very efficiently (e.g. see typed memory views: http://docs.cython.org/src/userguide/memoryviews.html) and has good integration with numpy datastructures and C-API.

gajomi · on July 22, 2012

>Yes as long as you profile and make sure that the bottleneck is a blas / atlas call

I agree. Where Python really shines, I think, is where you have these sorts of bottlenecks + something extra that MATLAB performs terribly at, either as a consequence of language/VM design or lack of libraries.

ivan_ah · on July 22, 2012

> Yes, weave is unmaintained, ugly, and hard to debug >

What is this bad mouthing of scipy.weave? I am a big fan of this approach. Nympy for everything + a simple weave inline for the inner most loop.

Maybe it is not maintained simply because it works?

srean · on July 23, 2012

Though I love weave myself, it does not get much love in the numpy community. There you would be discouraged from using weave and strongly nudged towards Cython.

The weave source hasnt seen development since ages, whereas Blitz++, the C++ array library that it is based upon has moved on quite a bit. Blitz++ has added SIMD support, or rather restructured its code so that the compilers find it easy to vectorize. The new version of Blitz++ holds its own against ifortran in terms of vectorization. These are some of the advantages that you could have enjoyed had weave been kept uptodate. I dont blame the numpy community for this though, though Blitz++ sees continuous development there has not been any formal release in tens of years. So it does become difficult to incorporate such a library. But I dont think that is the main reason why weave has languished.

I am sure Cython is great, but what I like about weave is the syntactic sugar that it brings. I do not have to write raw loops or do pointer arithmetic. If you want this kind of syntactic sugar in Cython now, you call back into the numpy API. If the default API does not give you the speed that you want, you have to expose the raw pointers of the arrays and to the messy pointer arithmetic and operations yourself. Nothing wrong with that, just that it can be error prone.

Cython however has other good things going for it, for instance it allows easy coordination with OpenMP, so it is easy to parallelize array updates, without incurring the multiple processes overhead.

iskander · on July 23, 2012

I actually love weave but I felt it was irresponsible to advertise its use without a huge warning.