Hacker News new | past | comments | ask | show | jobs | submit login
A brief experiment with PyPy (lwn.net)
83 points by jnoller on May 12, 2011 | hide | past | favorite | 23 comments



That 3x speed up is about the same that I have seen with my code. I'm currently writing a database cache simulator to try different algorithms with it, and if I want to have anywhere near realistic results I have to use realistic access traces.

Tried it today with a tpc-c trace which has about 500 million accesses. The result: CPython would have run for about 90 minutes (I stopped it after 30 minutes, and began to look for a speedier possibility), PyPy only took 22 minutes.


I've gotten about a 10x speed up on numerics code where there's so much branching involved in the calculations that I can't afford to use NumPy.

As for me, the main reason I haven't moved to PyPy yet is the lack of database and messaging support.


Which databases? At the moment we have SQLite, Oracle (haven't tested it myself), and Postgresql. Plus whatever you can find a pure python driver for. Also, what do you mean by messaging?


Oh wow, I didn't realize that Postgresql was working on PyPy. I heard that Django was only tested with SQLite so I made my assumptions from then on.

By messaging, I mean something like RabbitMQ, that way I can have batch scheduling at a little bit more sophisticated grain than "run a cronjob".


psycopg2 is implemented in a fork of mine: http://bitbucket.org/alex_gaynor/pypy-postgresql/ it requires compiling yourself, but works nicely (I was told by someone that this brought their script's time from 2 minutes to 8 seconds). As of last test it passes all Django tests. What's the current standard RabbitMQ lib? I didn't realize it was a c-extension (hell I've used it myself and never noticed).


Well the most used one is Celery. It depends on multiprocessing which blew up on me the last time I tried it in PyPy.

But.... I just tried "import multiprocessing" in PyPy 1.5 and it worked! Is this all part of the C-API compatability layer? Does that mean Cython code may soon work in PyPy too (that's my pony feature)?

RabbitMQ should work under PyPy currently then, all of its dependencies purport to be pure python. ---

Another RabbitMQ lib is Rabbitmq-c which is direct wrapping around librabbitmq-c. It ecks out extra performance vs pure python rabbitmq, but mostly it isn't needed.


Nope, multiprocessing was added to the Python standard library in 2.6, our previous releases implemented python 2.5, 1.5 implements 2.7, so it now includes multiprocessing.


That's good news! Guess it's time to remove the mechanism in Celery that disables the multiprocessing pool when running under PyPy.


There's also MySQL via PyMySQL


MySQLdb also works: https://bitbucket.org/pypy/compatibility/wiki/mysql-python

I just compiled it today and it works.



No, tpc-uva is a bit more than what I need right now. I might use it later on when I have decided on any single algorithm that I want to test in a more realistic environment. Because changing the caching algorithm that postgres uses isn't as easy as doing so in a standalone python simulator, I will have to be sure that I want to do that. I have already tried that before and it is a lot harder and takes a lot more time.


This is really really great to see: pypy is such an interesting project, and it's really encouraging to see it make so much ground.

It's interesting that Guido deliberately didn't go for a full re write for python 3, but this project which is a full rewrite in a whole different language has provided a faster implementation with less developers!


To be fair, it also needed eight years to get here, PyPy inherits much standard lib code, and the py3k effort wasn't limited to the interpreter but also involved a lot of standard lib development :-). So, apples and oranges. Still, amazing work by the PyPy folks any way you look a it.


Can anyone comment on startup latency and performance early in a run?

One of my use cases for Python is relatively small, short-running scripts, and JIT engines often take a fair amount of runtime before all the optimizations kick in. So I wonder how PyPy does at startup time and whether it's able to leverage its execution speed prowess over brief runtimes - is it still a net performance win in the end, or at least not-worse-than-CPython?


Last I checked we startup faster than CPython. As for early-run-performance. Startup really depends on the total amount of code you have, if you've got a few hundred lines of code the JIT's often warmed up and fast in under half a second. On the other hand if you have a few hundred thousand lines it might take a minute to warm up.


Faster startup sounds nifty :-).

How's performance while the JIT is not yet warmed up, compared to CPython?


Pypy starts a light weight interpreter first (that is probebly just about the same speed as CPython maybe a bit faster) and then only compiles if it sees a recuring pattern.


With the JIT totally disabled I think we're between .8-2x slower, depending on what you're doing.


Can't wait to have a version of PyPy that supports numpy!

Since the benefits have been proven, I can't help but wonder why there aren't more people working on this...


People are working on it. For example, Quora is having Alex Gaynor help move their site to PyPy this summer: http://alexgaynor.net/2011/may/06/this-summer/


The main reason why more people are not working on this is that:

* volunteers are interested somewhere else

* nobody is willing to put money into numpy on pypy happening


I think part of the issue is that even PyPy + NumPy wouldn't give you compelling gains versus NumPy + Cython in CPython.

The big win of PyPy vs. Cython is that you get a 10x or so speed improvement without having to specify types, but if you're using NumPy already, then its pretty standard to specify types already so NumPy can optimize. At that point, you really have to ask yourself if it isn't worth it to just build a Cython module and get next to C speed for that hotspot.

The great thing about Cython is that it lets you call inbetween C and Python at little overhead without having to write any code for the Python C-API like you would if you were using CTypes or even Boost::Python.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: