Unladen Swallow 2009Q3 Released with LLVM optimized Python

cool-RR · on Oct 21, 2009

Firstly let me say I think this is a wonderful project and one of the most important Python projects I know.

Something I noticed though, in the Project Plan page:

http://code.google.com/p/unladen-swallow/wiki/ProjectPlan#20...

They state that one of the goals for "2009Q3 and beyond" is "to remove the GIL and fix the state of multithreading in Python." Which I think is an awesome and ambitious goal that I would find very useful as will many others.

So I hoped the GIL thing would fall into the "2009Q3", but apparently it's in the "beyond" category. Anyway, they need to update that project plan.

tlb · on Oct 21, 2009

My big python app (the robot end of a telepresence connection) suffers from GIL problems. Mostly, threads are used for C-only stuff like video compression so they can easily release the GIL while running. But I gotta say, the complexities of GIL correctness, thread performance, gtk/gobject, callbacks, boost.python, GC-correctness of both Python and C++ objects, and polling make my head hurt. Simplicity would be a find goal for the new system.

cheriot · on Oct 21, 2009

Is there a good python message passing implementation? Some problems fit message passing more naturally than threading.

scott_s · on Oct 21, 2009

http://docs.python.org/library/multiprocessing.html#module-m...

It has a Queue interface for passing objects, but also allows shared state.

beambot · on Oct 22, 2009

If you're in the robotics space, you should seriously consider the Robot Operating System: http://www.ROS.org

You'll find their publish-subscribe message passing is quite simplistic, and the support is great.

kingkilr · on Oct 21, 2009

If you head over to python-ideas there's actually some really good discussion about removing the GIL right now. Someone has proposed using CAS operations (which I know nothing about) to remove the fine grained locking that is needed under the refcounting GC without the GIL. In addition Antoine Pitrou has been working on a patch to make the GIL more performant on POSIX systems.

sb · on Oct 21, 2009

jfyi: CAS is the abbreviation of Compare-And-Swap, which are dedicated native machine instructions that offer atomicity and therefore simplify locking.

scott_s · on Oct 21, 2009

Hmm. It doesn't really "simplify" locking. In does in the simple case of:

  lock();
  ++value;
  unlock();

Where a lock is used to protect a single memory location, and the critical section is straight-line code. But when the critical section is more involved, CAS based algorithms become considerably more complicated. CAS based algorithms which do not require mutual exclusion are called "lock-free algorithms," and reasoning about them is hard.

The wiki for it is a good start: http://en.wikipedia.org/wiki/Non-blocking_synchronization And I'm going to go ahead and plug the work I've done in the area: http://people.cs.vt.edu/~scschnei/streamflow/

sb · on Oct 24, 2009

hi, obviously you have more experience here--I have just provided details straight from memory and did not want to disseminate misinformation and apologize if you or anyone else got that impression.

Many of the lock-free algorithms use CAS native machine instructions, however. What exactly do you mean by "reasoning about them is hard"? Do you mean in formal methods sense, or just difficult to grasp in general? I am just glancing over your ISMM'06 paper which looks very interesting--it seems you are using CAS instructions there, too (which makes me wonder that I was probably too unspecific in my "simplifying locking", which at least for my sloppy thinking includes "lock-free" algorithms too, sorry for that...)

scott_s · on Nov 7, 2009

Don't know if you'll see this, but:

Reasoning about lock-free algorithms is hard because there is no mutual exclusion. Locks enforce a critical section; you guarantee that only a single thread will execute in a critical section at any given time. This allows you to change state local to that critical section without worrying about other threads interfering.

Lock-free algorithms have no critical sections. All threads can execute all instructions at all times. This means that when you update shared state, you have to think about the implications of what happens when another thread touches the same state this thread is trying to update. Keeping this level of concurrency in your head at all times is difficult, which means that reasoning about the correctness of an algorithm is difficult.

oz · on Oct 21, 2009

Thanks, I was wondering what Column Address Strobe had to do with interpreters

cool-RR · on Oct 22, 2009

Funny, I was wondering how a Computer Algebra System could help.

bdr · on Oct 21, 2009

They mentioned this at the recent SF Python Meetup. They had found an academic paper that they thought they could apply to Python to get rid of the GIL, but it turned out someone had already tried it unsuccessfully. I think that goal of unladen-swallow is definitely "beyond" for now.

mbrubeck · on Oct 21, 2009

"2009Q3 uses up to 930% less memory than the 2009Q2 release."

Does that math make sense in some way I'm not seeing?

wisesage5001 · on Oct 21, 2009

From their benchmarks rietveld: $ ./perf.py -r -b rietveld --args "-j always," --track_memory ../q2/python ../q3/python Mem max: 575116.000 -> 55952.000: 927.87% smaller

This 930% basically means that 2009Q3 can use as little as ~1/10 of the memory of 2009Q2.

jimbokun · on Oct 21, 2009

"Uses 1/10 the memory" would have been much easier to parse.

palish · on Oct 21, 2009

For a human, maybe. We're programmers.

sparky · on Oct 21, 2009

It's unconventional, but 9.3x less memory can be seen in some sense as 1/9.3=10.7 percent as much memory. Not defending it, just trying to interpret :P

iclelland · on Oct 21, 2009

Actually, it's 10.3x less memory (they've subtracted 1), so it's using 9.7% as much memory as the previous release. I would have said that 90.3% less than before, but maybe that's just me

sp332 · on Oct 21, 2009

You can't have 9.3x less memory, either. You can have (1/9.3)x as much memory, or they could just say 10% as much or 90% less.

sparky · on Oct 21, 2009

I agree that it is technically unsound, but the meaning of "Nx less of something" is still relatively unambiguous when N>1.

jacobolus · on Oct 21, 2009

Yes, it’s stupid (ambiguous, imprecise) terminology, but pretty common usage when describing computer benchmark results.

ErrantX · on Oct 21, 2009

I never checked but I assumed it meant that the memory savings of Q3 compared to the savings of Q2 increased 930%

mbrubeck · on Oct 21, 2009

There's no "savings" vs. CPython: Memory usage is still 2-3x that of Python 2.6.1.

(Edit: Or did you mean that the 2009Q2->Q3 savings was 930% greater than the Q1->Q2 savings?)

scott_s · on Oct 21, 2009

Right, they're comparing against themselves.

simonw · on Oct 21, 2009

Is it noticably faster than regular Python yet?

sparky · on Oct 21, 2009

Their first release (2009Q1) consisted mostly of patches to mainlines CPython, which improved performance for everybody by roughly 10%. They've added about 10-15% performance on each of the last 2 releases in Unladen Swallow-specific code, but it's unclear (to me) how CPython performance has evolved over the last 6 months. They haven't compared explicitly since 2009Q1, which may or may not be revealing.

From the release notes:

Lowlights:

    * LLVM's JIT and other infrastructure needed more work than was expected. As a result, we did not have time to improve performance as much as we would have liked.
    * Memory usage is still 2-3x that of Python 2.6.1. However, there is more overhead that can be eliminated for the 2009Q4 release.

The last 2 releases have been mostly building up infrastructure; hopefully most of that is behind them and they can work on getting the 5x performance they're after.

bobbyi · on Oct 21, 2009

Do you know whether those Q1 changes have actually been merged into CPython? Will they be coming in 2.7?

kingkilr · on Oct 21, 2009

Yes, they have.

bobbyi · on Oct 21, 2009

I don't see mention of them in the changelog: http://docs.python.org/dev/whatsnew/2.7.html

kingkilr · on Oct 21, 2009

Not all of them have been backported, but basically anything that says it's been contributed by Collin Winter or Jeffrey Yaskinn is something that's been backported.

clopey · on Oct 21, 2009

Has anyone got this working on Windows?

rbanffy · on Oct 21, 2009

I am considering giving it a try under Cygwin. If you get there first, please, tell us.

clopey · on Oct 21, 2009

I did try under cygwin, and failed with errors in llvm. Will let you know if there's any more progress though.

maksimka · on Oct 22, 2009

What's the plan for Unladen Swallow to support py3k?