Hacker News new | past | comments | ask | show | jobs | submit login
Wikipedia processing. PyPy vs CPython benchmark (rz.scale-it.pl)
101 points by robert-zaremba on Feb 18, 2013 | hide | past | favorite | 35 comments



Can I tangentially point out without much connection to this benchmark more than any of the others recently, that one of the great things about PyPy is that if you have a thing and you run it on PyPy, you can usually pop in the IRC channel and often get even more tips on how to tune it to be even faster?

There are the simple tips like "write everything in Python where possible, don't use C extensions" like the OP noticed, but even after you've made the decision on using PyPy there are often specific performance characteristics of the PyPy implementation that can be really helpful to keep in mind, and it's a great resource to try and take advantage of (human interaction with PyPy developers like fijal who care about making things fast).


Can you give an example of a tip that would speed a particular program up? Does PyPy have a lot of knobs to tune?

CPython doesn't seem to have that many knobs... You can tune some settings in GC, and maybe the GIL acquiring interval, but it doesn't seem to produce speedups in most programs.


PyPy does have a few knobs to tune, but they're not very often used (they're nicely documented, however mostly in the source code, I'll try to pull say GC parameters more prominently in the docs).

However, the way you write python code can be tuned a lot. This is actually, at least to an extend, our failure. JIT can make things really fast, but can get confused in places. We're trying to eliminate them as we go, but it can't be always done.

Generators vs list comprehensions. Not using sys._getframe, sys.exc_info, etc. not relying too much on dynamism like passing very general kwargs and *args, the list goes on.

Jitviewer can be a lot of help, but it's a bit buggy and too low level to be seriously recommended.


> Generators vs list comprehensions

I'm a bit confused. Which one is faster?


list comprehensions of course!


The exclamation led me to believe you were being facetious, but https://bitbucket.org/pypy/pypy/wiki/JitFriendliness agrees with you:

"As of now (Feb 2011) generators are usually slower than corresponding code not using generators. Same goes for generator expressions. For most cases using list comprehension is faster."



Great to see real world use cases, and very encouraging to see PyPy performing so well. I'll definitely be trying it on my future compute-intensive projects.


I'm reminded of when I first started following the PyPy project, and I thought it would be impressive if they could ever get close to CPython's speed. They've done something truly amazing.

I hope the python community someday coalesces around a single version. I generally can't take advantage of all their awesome work because the libraries I depend on (e.g. pandas) won't run in pypy.


>Moreover PyPy doesn’t kill my CPU as CPython does so in a meantime I could normally use my laptop

You're not supposed to "use your laptop" during a benchmark.


Your advice is good in general. However, note the OP is reporting user time (probably as shown by the "time" command). This time is the total CPU time used by the process. It's not measuring how much wall clock time it took to complete or the time used to complete involved IO operations. I think trusting that number should be fine in this case.


It can cause problems with competing for cache space, which could have large effects on CPU time.


You're right and I stand corrected.


He easily could have discovered that during debugging or when he was actually trying to make progress on the project, not benchmarking. But I agree, he should be explicit about that variable.


you're just increasing the uncertanity, but not drastically so. If you use your laptop during benchmarks that show 10x improvement, then your thesis still stands


Not if you were playing Crysis 3 during part of one benchmark and looking at Facebook the rest of the time.


No. To be honest PyPy is very sensitive to cache usage, so running any other program trashing the cache might be a serious problem (can lead to 30% performance degradation, depending on the load even if the core is unoccuppied at all)


I'm confused. You started with "No" and then continued with something that is either in agreement or orthogonal to my point.


That was "I agree with you" kind of no. English is hard, sorry.


Ah, I follow now. I knew I was missing something, and there it is. As somebody who writes English professionally, I agree wholeheartedly.


For the end benchmark I've used "runlevel 3" (multiuser without window manager) to perform this tasks, to maximize cache and RAM usage.


That's the spirit! You should also stop services known for spikes in CPU usage like the cron daemon and run the benchmark multiple times.


A good way to measure CPU performance in a CPU-agnositic way (not sure if thats the right phrase to use) is instruction-count. you would have to disregard things like cache when looking at instruction count, which is probably bad, but it serves as a good CPU measure.


No, it's not good at all. Cache stalls can easily account for 1/2 of your processing time. On top of that you have CPU pipelining and multi-issue CPUs. It was a good idea a while ago, now it's really not that great.


You do if that's similar to the expected deployment environment.


Only if it's something you can replicate exactly for each run of the benchmark.


There's no need for perfect repeatability when statistical analysis is good enough. After all, even if you have control down to the iron, the randomness in external interrupts and the effect of temperature on the hardware will cause some unpredictability.


I wonder what would happen if you would toss a LLVM solution at it :) Seems like some are working on it according to google. No joy yet though.


I believe you're referring to the Unladden Swallow project, which unfortunately is "pining for fjords". The project didn't make the progress expected and folded.

http://en.wikipedia.org/wiki/Unladen_Swallow


An optimisation path available to CPython is to put your inner loops into a C Extension. I think a benchmark with this would be interesting.


except it's work. and it's not that good usually because you end up manipulating Python objects anyway. It doesn't really work for anything that can't be represented as a tight semi-autonomous C code with limited connection to Python code. And then you usually can't run it on PyPy, which might or might not be an issue.


Yes, that is the trade-off, but I think you'll definitely be able to find areas where you can use c to do some processing in the bottlenecks. If you find you don't have enough performance, it is an area to explore. My experience is with ruby and rubyinline, granted ruby isn't as fast as Pypy.


Unless that work was already done and available to you (lxml).


See comments on the blog. Majority of work is not in xml parsing so lxml does not help (and you're crippled that the rest of the code does not run as fast).


In the same way C implementation could be interesting. The purpose was measure efficient of PyPy for you Python code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: