PyPy gets faster

fauigerzigerk · on Oct 15, 2010

Not that I'm a big fan of Jython, but including startup time in a benchmark is only useful for very short running command line tools. It says nothing about the speed of the JIT or the code that's being tested.

dripton · on Oct 15, 2010

I thought about that, but decided against subtracting out startup time. This is a real-world benchmark of how long it takes to run some actual code that I care about. Startup time counts, but the cumulative execution time will be dominated by time taken to run the slower programs, not startup time in the trivial ones, so I don't think I'm counting it too much.

But I think next time I do this I'll increase the max runtime from 1 minute to 3. That will keep more of the slower programs in the benchmark, and make startup time count that much less. Without actually removing it, because that seems too artificial and contrived to me.

mzl · on Oct 15, 2010

On the other hand, there are a lot of small scripts and utilities that are written in Python, so it is a useful metric for quite a lot of use-cases.

fauigerzigerk · on Oct 15, 2010

I don't see why I would want to benchmark those kinds of utilities.

mzl · on Oct 15, 2010

Because the start-up time of the interpreter will affect you every time you run one of those utilities.

fauigerzigerk · on Oct 15, 2010

Sure, and I did mention command line utilities. The issue I see with the benchmark is that it compares apples and oranges. For some programs it tests JIT performance and for others it tests startup time. If I want to test startup time, I just use a hello world program. To test JIT/interpreter performance, I try to exclude startup time.

derefr · on Oct 15, 2010

Hello world programs only ever rely on the bare runtime, though; real utilities have to load libraries after the runtime loads, which incurs further delays (which might depend on JITing/interpreting speed if they're not pre-compiled.)

mzl · on Oct 15, 2010

Ok, you've got a point there about separating the things that are measured.

rdtsc · on Oct 15, 2010

The same can be said about benchmarking numeric vs io code. Is it apples and oranges and bananas then? Most programs have a mix of everything. Whether Eurler's problems are a good mix that represent your workload is up to you to decide.

Raphael_Amiard · on Oct 15, 2010

In addition to what fauigerzigerk said, if you run a lot of command line utilities where startup time is relevant, there are established solutions to that (nailgun for example) so it still isn't a fair comparison. If you want a comparison of state of the art solutions, which this post obviously is, put nailgun into the mix.

ent · on Oct 15, 2010

those utilities are often used shell scripted to run many times in a row. at least in interactive use, the multiplied startup times can grow annoyingly long.

fauigerzigerk · on Oct 15, 2010

You could use one of those Java background daemons to do that, but anyway, what I was trying to say is just that this benchmark doesn't test JIT or interpreter performance in the case of Jython.

anonymous · on Oct 15, 2010

Also Hotspot VM needs warmup to achieve maximum performance since it takes some time to detect and JIT-compile performance-critical parts of code with all optimizations.

dripton · on Oct 15, 2010

As does every other JIT. This is an apples-to-apples comparison.

kwellman · on Oct 15, 2010

Has anyone done any recent memory benchmarks with PyPy?

I'd like to see magnitude of the memory trade-off for using a JIT compiler. As a web developer my programs are mostly IO-bound, not CPU-bound. I'm also bootstrapping and trying to squeeze as much as I can out of my 512MB linode.

kingkilr · on Oct 15, 2010

PyPy's objects are smaller than CPython's, however the steady state interpreter is larger, and the JIT adds some additional overhead due to bookeeping and generated machine code.

kwellman · on Oct 15, 2010

PyPy is a very interesting project, but I suppose there's just no "killer feature" for me to start using it at the moment.

silentbicycle · on Oct 15, 2010

Comparisons with LuaJIT (http://luajit.org/) would be particularly nice.

metamemetics · on Oct 15, 2010

LuaJIT is a nigh untouchable work of art, we might never see a faster dynamic language JIT.

PyPy is significant because of the nature of Python itself: the language specification is much more complex than Lua and there are currently many more users and applications.

[the more complicated the language specification, the harder it's going to be to prove things about, the harder it's going to be to write a compiler]

Zak · on Oct 15, 2010

>we might never see a faster dynamic language JIT

Lisp implementations have been performing at a similar level for years now - many with the aid of AOT compilation, some not. Currently, Racket and SBCL are comparable to LuaJIT on the Alioth microbenchmarks - faster at some and slower at others.

Raphael_Amiard · on Oct 15, 2010

> LuaJIT is a nigh untouchable work of art, we might never see a faster dynamic language JIT.

Well Mike Pall disagrees in that he says there are still a lot of possible optimizations. Also all the process of by hand optimization for specific architecture can theoretically be automatized and decoupled. You're totally right about Python complexity though.

acqq · on Oct 15, 2010

But the speed difference is still too big. Even if PyPy is two times faster than CPython, LuaJit still remains in most of the cases the order of magnitude faster, they are comparable only when calculation bottleneck are bignum routines and not rest of the language:

http://shootout.alioth.debian.org/u32/benchmark.php?test=all...

In short, if you care about the speed and LuaJit is an option, the choice is obvious.

igouy · on Oct 15, 2010

Ummm, better to point to PyPy :: LuaJIT ?

http://shootout.alioth.debian.org/u32/benchmark.php?test=all...

Caveat - notice how many of those Lua programs were written by Mike Pall.

silentbicycle · on Oct 15, 2010

The ones I've read are relatively straightforward Lua, though.

Some shootout programs look really hairy compared to normal code in their language. (The "optimized Haskell" shootout programs were, at one point, though I haven't followed it for a while.) With Lua / LuaJIT, that doesn't seem to be the case.

Besides, Mike Pall is using some of the shootout benchmarks to tune LuaJIT, so it's not surprising he has many of the top submissions.

igouy · on Oct 16, 2010

It isn't a matter of being surprised that Mike Pall has contributed programs, more a statement of the obvious lest we forget.

Mike Pall is kind-of a good programmer, and that may well effect how the programs perform.

silentbicycle · on Oct 16, 2010

I think it has more to do with how his runtime performs. :)

Tuning Lua code really isn't that hard, the language is tiny and has both semantics and performance characteristics that are easy to reason about accurately.

There's a good sample chapter from _Lua Programming Gems_ on Lua performance tuning (http://www.lua.org/gems/sample.pdf), FWIW. That and a good profiler will get you far.

dripton · on Oct 15, 2010

You want me to port all my Project Euler solutions from Python to Lua so that I can tell you that LuaJIT is much faster than any Python implementation, which you already know?

mikemike · on Oct 16, 2010

Well, here's Euler14 in Lua for a start: http://lua-users.org/lists/lua-l/2010-09/msg00568.html

But obviously for a proper comparison of the runtimes the benchmarks need to be implemented the 'same way' and not simply yield the 'same result'.

silentbicycle · on Oct 15, 2010

Not at all, I'm just asking for comments from anybody who has experience with both. I used to use Python quite a bit, but switched to Lua a few years ago and PyPy really hasn't been on my radar.

dripton · on Oct 15, 2010

Yes, I will also include peak memory used next time.

jnoller · on Oct 15, 2010

Competition in the python-vm space is fantastic and I'm actually really, really happy to see PyPy maturing.

Raphael_Amiard · on Oct 15, 2010

That's really nice for PyPy. Although Eulers are mainly numerical tests. Most of the work done in many programmers "real world" are work on strings, which CPython is very good at since all its strings libs are implemented in C. So i'd like to see a comparison with a better benchmark i guess :)

CWuestefeld · on Oct 15, 2010

I'd like to see IronPython included in the comparison. (do they handle 2.7?)

dripton · on Oct 15, 2010

Yes, I'll include IronPython next time.

IronPython is at 2.6. PyPy and Jython are at 2.5. psyco is at 2.6.

Most of my Euler solvers are compatible with Python 2.5. The ones that don't work in 2.5 end up getting excluded from the benchmark, because the benchmark only shows programs that worked and finished in less than a minute on every tested Python.

random42 · on Oct 15, 2010

Just to be clear, what are the specified numbers? execution time in seconds?

dripton · on Oct 15, 2010

Yes, execution time in seconds. And I throw out any program that fails on any Python, or takes longer than a minute on any Python.

snissn · on Oct 15, 2010

PyPy needs epoll!

kingkilr · on Oct 15, 2010

epoll was new in CPython 2.6, PyPy currently targets Python 2.5, we have a branch where we're working towards Python 2.7 support (we skipped a step), that will include epoll support.

snissn · on Oct 15, 2010

Ah, okay, that totally makes sense!

mikeklaas · on Oct 15, 2010

Years and years later and they are finally able to (very slightly) beat psycho. Sigh.

kingkilr · on Oct 15, 2010

Sure, but we have a 64-bit backend ;) After years and years still no one wants to write one for psycho.

Edit: Psycho also broke Python's semantics in a few very subtle ways, in that sense it isn't a truly fair comparison.

cool-RR · on Oct 15, 2010

How does Psyco break Python's semantics? I'm using Psyco for years and I didn't know that it did.

kingkilr · on Oct 16, 2010

2nd section here: http://psyco.sourceforge.net/psycoguide/bugs.html

mikeklaas · on Oct 15, 2010

Hey, don't get me wrong, it's great that you're making strides, and I still hope the project is a success.

My viewpoint comes as someone who was very excited initially at the thought of a drop-in replacement for CPython whose goal was to be "faster than c". It's been a very long time and it seems like pypy still has a long way to go to achieve production-readiness, so it's hard to continue being excited about the project.

andybak · on Oct 15, 2010

Aside from the rapid performance gains in recent months and the fact that it's twice as fast as CPython...

mikeklaas · on Oct 15, 2010

... in these benchmarks, which are mostly tight-loop math code, from what I understand.

Being a drop-in replacement also includes things like virtual crashproofness, full stdlib support, runs major third-party packages flawlessly, etc.

Semiapies · on Oct 15, 2010

Then don't be "excited". However, it's douchey to mock others' work just because they're not on your timetable.