Can I tangentially point out without much connection to this benchmark more than any of the others recently, that one of the great things about PyPy is that if you have a thing and you run it on PyPy, you can usually pop in the IRC channel and often get even more tips on how to tune it to be even faster?
There are the simple tips like "write everything in Python where possible, don't use C extensions" like the OP noticed, but even after you've made the decision on using PyPy there are often specific performance characteristics of the PyPy implementation that can be really helpful to keep in mind, and it's a great resource to try and take advantage of (human interaction with PyPy developers like fijal who care about making things fast).
Can you give an example of a tip that would speed a particular program up? Does PyPy have a lot of knobs to tune?
CPython doesn't seem to have that many knobs... You can tune some settings in GC, and maybe the GIL acquiring interval, but it doesn't seem to produce speedups in most programs.
PyPy does have a few knobs to tune, but they're not very often used (they're nicely documented, however mostly in the source code, I'll try to pull say GC parameters more prominently in the docs).
However, the way you write python code can be tuned a lot. This is actually, at least to an extend, our failure. JIT can make things really fast, but can get confused in places. We're trying to eliminate them as we go, but it can't be always done.
Generators vs list comprehensions. Not using sys._getframe, sys.exc_info, etc. not relying too much on dynamism like passing very general kwargs and *args, the list goes on.
Jitviewer can be a lot of help, but it's a bit buggy and too low level to be seriously recommended.
"As of now (Feb 2011) generators are usually slower than corresponding code not using generators. Same goes for generator expressions. For most cases using list comprehension is faster."
Great to see real world use cases, and very encouraging to see PyPy performing so well. I'll definitely be trying it on my future compute-intensive projects.
I'm reminded of when I first started following the PyPy project, and I thought it would be impressive if they could ever get close to CPython's speed. They've done something truly amazing.
I hope the python community someday coalesces around a single version. I generally can't take advantage of all their awesome work because the libraries I depend on (e.g. pandas) won't run in pypy.
Your advice is good in general. However, note the OP is reporting user time (probably as shown by the "time" command). This time is the total CPU time used by the process. It's not measuring how much wall clock time it took to complete or the time used to complete involved IO operations. I think trusting that number should be fine in this case.
He easily could have discovered that during debugging or when he was actually trying to make progress on the project, not benchmarking. But I agree, he should be explicit about that variable.
you're just increasing the uncertanity, but not drastically so. If you use your laptop during benchmarks that show 10x improvement, then your thesis still stands
No. To be honest PyPy is very sensitive to cache usage, so running any other program trashing the cache might be a serious problem (can lead to 30% performance degradation, depending on the load even if the core is unoccuppied at all)
A good way to measure CPU performance in a CPU-agnositic way (not sure if thats the right phrase to use) is instruction-count. you would have to disregard things like cache when looking at instruction count, which is probably bad, but it serves as a good CPU measure.
No, it's not good at all. Cache stalls can easily account for 1/2 of your processing time. On top of that you have CPU pipelining and multi-issue CPUs. It was a good idea a while ago, now it's really not that great.
There's no need for perfect repeatability when statistical analysis is good enough. After all, even if you have control down to the iron, the randomness in external interrupts and the effect of temperature on the hardware will cause some unpredictability.
I believe you're referring to the Unladden Swallow project, which unfortunately is "pining for fjords". The project didn't make the progress expected and folded.
except it's work. and it's not that good usually because you end up manipulating Python objects anyway. It doesn't really work for anything that can't be represented as a tight semi-autonomous C code with limited connection to Python code. And then you usually can't run it on PyPy, which might or might not be an issue.
Yes, that is the trade-off, but I think you'll definitely be able to find areas where you can use c to do some processing in the bottlenecks. If you find you don't have enough performance, it is an area to explore. My experience is with ruby and rubyinline, granted ruby isn't as fast as Pypy.
See comments on the blog. Majority of work is not in xml parsing so lxml does not help (and you're crippled that the rest of the code does not run as fast).
There are the simple tips like "write everything in Python where possible, don't use C extensions" like the OP noticed, but even after you've made the decision on using PyPy there are often specific performance characteristics of the PyPy implementation that can be really helpful to keep in mind, and it's a great resource to try and take advantage of (human interaction with PyPy developers like fijal who care about making things fast).