Python Performance Tips

sophacles · on Jan 4, 2010

These tips are ok. Particularly the ones about profiling. I have a few more that I have found over time. First tho, as always a good algorithm improvement can help over any of these. Trick 0 is about that, the rest are more about performance.

0. Generator comprehensions and list comprehensions are you friends. Sometimes one will make things like woah fast -- generators are lazy evaluation at its finest (if you don't follow that find a Haskell fan and ask them), this can be a real boon to your app. Sometimes you need the list comprehension. If you can't reason out which would be better for you, or reasoning suggests it shouldn't matter, try both anyway, just to be sure :).

1. Function calls are expensive. Sometimes when you call a function a lot (on the order of 1000's of calls in a normal run) its better to bite the bullet and just unfactor it. I have seen improvements of 50% from just this.

2. Dictionaries are fast. Classes are great, objects make life easy on programmers, but when dealing with large datasets, sometimes nested list/dict combos is way better than Objects (which are just dicts, but the syntactic sugar of objectness can eat up quite a few method calls, see #1 above). Also, sets are extremely useful and based on dicts (therefore also fast), so if the semantics of a set are ok with your needs, use them.

3. Python regex is fast. It is usually frowned upon, but sometimes a complex regex is way better than pyparsing or python string methods. This saved my butt on one very memorable occasion -- #1 and #2 above didn't really do enough, so I replaced the core processing bits with a very complex regex and got the speedup.

4. The gc module can really do a LOT of good. Short running scripts with lots of data can really be sped up by turning off gc (if you have the memory capacity). Even tuning generational parameters can really affect performance -- almost shockingly so.

5. Any type of bit twiddling sucks. Take the time to do it in C and make the extension. This includes most IP ops -- if you are working with large lists of addresses look into dnet. Similar libs exist for a lot of other projects.

6. __slots__ prove very useful sometimes. So does the struct module. Learn about both, they can really do wonders for your code in both readability and performance if used carefully.

7. All of the above are wrong in some contexts -- they are not hard and fast rules, but guidelines, in some cases they work great, in others they don't help at all.

durin42 · on Jan 5, 2010

I'd offer up a word of caution though about generator comprehensions. They're much more expensive than list comprehensions in the small cases. In Mercurial, a patch came up to move some listcomps to gencomps, and it was rejected because it slowed things down. Sometimes the state required to manage the generator is more costly than evaluating everything.

Also, dot lookups in a loop are frequently a poor choice. Often you can get a huge speedup in a tight loop by changing

  for x in xrange(10000000): # or whatever
    foo.bar()

to

  foobar = foo.bar
  for x in xrange(10000000): # or whatever
    foobar()

bbb · on Jan 5, 2010

5. Any type of bit twiddling sucks. Take the time to do it in C and make the extension.

I recently had to do something like this and ended up using SWIG to generate the glue code. However, I noticed that my code had to spent considerable time (many iterations) in the C++ library to reduce overall execution time (but then it did by a factor of 20x-40x).

Do you happen to have some advice on how much speedup could be gained by replacing SWIG with handwritten wrappers?

sophacles · on Jan 5, 2010

My personal preferences are: Boost.python for C++[1], and Pyrex/Cython for C wrappers. Both make things pretty nice. I never really got into swig, so I'm not sure if there is noticable speedup betwen any of these. As for handwritten wrappers, I have not had any personal experience trying to eke the extra speed from not using a code generator type wrapper. hth

dagw · on Jan 5, 2010

Swig solves a somewhat different problem than Boost.python/pyrex. Swig works better when you have an existing C++ codebase you want to call from python while making as few changes to the C++ side as possible, while Pyrex/Boost work better when you are writing a C or C++ module from scratch to be called from python.

pkrumins · on Jan 4, 2010

I wonder if any of them still are current.

"At the time I originally wrote this I was using a 100MHz Pentium running BSDI."

utku_karatas2 · on Jan 5, 2010

Most of the tips here seem to be based on "do stuff in such a way that less Python C API calls making lookups get involved". Python API is the same Python API more or less so I'd assume most of the tips are current.

jparise · on Jan 5, 2010

The timeit module (http://docs.python.org/library/timeit.html) is an invaluable tool for taking and comparing performance measurements.

admn_is_traitor · on Jan 5, 2010

Python performance tip #1: don't use it if performance is a design goal.

dhotson · on Jan 5, 2010

I wouldn't go that far as to avoid it altogether.

Python makes a great glue language even in applications where performance is important. You can write all the performance critical stuff in C and then glue it together in Python.

It's a pretty common approach and it works really well. Game engines often do this where the main engine is in C++ with scripting in Lua.

sophacles · on Jan 5, 2010

Performance can mean: blazing fast,always completes in the smallest possible time (theoretically anyway). It can also mean, runs in a reasonable time, within the parameters of the spec. Many times in the later case its faster and easier (in programmer time) to tune/tweak the slow scripting language, than it is to pull out the big guns.

stonemetal · on Jan 5, 2010

Performance is always a design goal for some value of performance. If print("Hello World") takes six years to execute you would give up and claim it is broken long before then even though "performance isn't a design goal".