Wikipedia processing. PyPy vs CPython benchmark

JulianWasTaken · on Feb 18, 2013

Can I tangentially point out without much connection to this benchmark more than any of the others recently, that one of the great things about PyPy is that if you have a thing and you run it on PyPy, you can usually pop in the IRC channel and often get even more tips on how to tune it to be even faster?

There are the simple tips like "write everything in Python where possible, don't use C extensions" like the OP noticed, but even after you've made the decision on using PyPy there are often specific performance characteristics of the PyPy implementation that can be really helpful to keep in mind, and it's a great resource to try and take advantage of (human interaction with PyPy developers like fijal who care about making things fast).

chubot · on Feb 18, 2013

Can you give an example of a tip that would speed a particular program up? Does PyPy have a lot of knobs to tune?

CPython doesn't seem to have that many knobs... You can tune some settings in GC, and maybe the GIL acquiring interval, but it doesn't seem to produce speedups in most programs.

fijal · on Feb 18, 2013

PyPy does have a few knobs to tune, but they're not very often used (they're nicely documented, however mostly in the source code, I'll try to pull say GC parameters more prominently in the docs).

However, the way you write python code can be tuned a lot. This is actually, at least to an extend, our failure. JIT can make things really fast, but can get confused in places. We're trying to eliminate them as we go, but it can't be always done.

Generators vs list comprehensions. Not using sys._getframe, sys.exc_info, etc. not relying too much on dynamism like passing very general kwargs and *args, the list goes on.

Jitviewer can be a lot of help, but it's a bit buggy and too low level to be seriously recommended.

ak217 · on Feb 18, 2013

> Generators vs list comprehensions

I'm a bit confused. Which one is faster?

robert-zaremba · on Feb 18, 2013

list comprehensions of course!

signed0 · on Feb 19, 2013

The exclamation led me to believe you were being facetious, but https://bitbucket.org/pypy/pypy/wiki/JitFriendliness agrees with you:

"As of now (Feb 2011) generators are usually slower than corresponding code not using generators. Same goes for generator expressions. For most cases using list comprehension is faster."

robert-zaremba · on Feb 18, 2013

PyPy jit friendly code tips: https://bitbucket.org/pypy/pypy/wiki/JitFriendliness

tworats · on Feb 18, 2013

Great to see real world use cases, and very encouraging to see PyPy performing so well. I'll definitely be trying it on my future compute-intensive projects.

dbecker · on Feb 19, 2013

I'm reminded of when I first started following the PyPy project, and I thought it would be impressive if they could ever get close to CPython's speed. They've done something truly amazing.

I hope the python community someday coalesces around a single version. I generally can't take advantage of all their awesome work because the libraries I depend on (e.g. pandas) won't run in pypy.

stefantalpalaru · on Feb 18, 2013

>Moreover PyPy doesn’t kill my CPU as CPython does so in a meantime I could normally use my laptop

You're not supposed to "use your laptop" during a benchmark.

rg3 · on Feb 18, 2013

Your advice is good in general. However, note the OP is reporting user time (probably as shown by the "time" command). This time is the total CPU time used by the process. It's not measuring how much wall clock time it took to complete or the time used to complete involved IO operations. I think trusting that number should be fine in this case.

sp332 · on Feb 18, 2013

It can cause problems with competing for cache space, which could have large effects on CPU time.

rg3 · on Feb 18, 2013

You're right and I stand corrected.

arcticfox · on Feb 18, 2013

He easily could have discovered that during debugging or when he was actually trying to make progress on the project, not benchmarking. But I agree, he should be explicit about that variable.

fijal · on Feb 18, 2013

you're just increasing the uncertanity, but not drastically so. If you use your laptop during benchmarks that show 10x improvement, then your thesis still stands

chc · on Feb 18, 2013

Not if you were playing Crysis 3 during part of one benchmark and looking at Facebook the rest of the time.

fijal · on Feb 18, 2013

No. To be honest PyPy is very sensitive to cache usage, so running any other program trashing the cache might be a serious problem (can lead to 30% performance degradation, depending on the load even if the core is unoccuppied at all)

chc · on Feb 18, 2013

I'm confused. You started with "No" and then continued with something that is either in agreement or orthogonal to my point.

fijal · on Feb 18, 2013

That was "I agree with you" kind of no. English is hard, sorry.

chc · on Feb 18, 2013

Ah, I follow now. I knew I was missing something, and there it is. As somebody who writes English professionally, I agree wholeheartedly.

robert-zaremba · on Feb 18, 2013

For the end benchmark I've used "runlevel 3" (multiuser without window manager) to perform this tasks, to maximize cache and RAM usage.

stefantalpalaru · on Feb 18, 2013

That's the spirit! You should also stop services known for spikes in CPU usage like the cron daemon and run the benchmark multiple times.

exacube · on Feb 18, 2013

A good way to measure CPU performance in a CPU-agnositic way (not sure if thats the right phrase to use) is instruction-count. you would have to disregard things like cache when looking at instruction count, which is probably bad, but it serves as a good CPU measure.

fijal · on Feb 18, 2013

No, it's not good at all. Cache stalls can easily account for 1/2 of your processing time. On top of that you have CPU pipelining and multi-issue CPUs. It was a good idea a while ago, now it's really not that great.

dalke · on Feb 18, 2013

You do if that's similar to the expected deployment environment.

stefantalpalaru · on Feb 18, 2013

Only if it's something you can replicate exactly for each run of the benchmark.

dalke · on Feb 18, 2013

There's no need for perfect repeatability when statistical analysis is good enough. After all, even if you have control down to the iron, the randomness in external interrupts and the effect of temperature on the hardware will cause some unpredictability.

Qantourisc · on Feb 19, 2013

I wonder what would happen if you would toss a LLVM solution at it :) Seems like some are working on it according to google. No joy yet though.

huxley · on Feb 19, 2013

I believe you're referring to the Unladden Swallow project, which unfortunately is "pining for fjords". The project didn't make the progress expected and folded.

http://en.wikipedia.org/wiki/Unladen_Swallow

kristianp · on Feb 18, 2013

An optimisation path available to CPython is to put your inner loops into a C Extension. I think a benchmark with this would be interesting.

fijal · on Feb 18, 2013

except it's work. and it's not that good usually because you end up manipulating Python objects anyway. It doesn't really work for anything that can't be represented as a tight semi-autonomous C code with limited connection to Python code. And then you usually can't run it on PyPy, which might or might not be an issue.

kristianp · on Feb 18, 2013

Yes, that is the trade-off, but I think you'll definitely be able to find areas where you can use c to do some processing in the bottlenecks. If you find you don't have enough performance, it is an area to explore. My experience is with ruby and rubyinline, granted ruby isn't as fast as Pypy.

stefantalpalaru · on Feb 18, 2013

Unless that work was already done and available to you (lxml).

fijal · on Feb 18, 2013

See comments on the blog. Majority of work is not in xml parsing so lxml does not help (and you're crippled that the rest of the code does not run as fast).

robert-zaremba · on Feb 18, 2013

In the same way C implementation could be interesting. The purpose was measure efficient of PyPy for you Python code.