Hacker News new | past | comments | ask | show | jobs | submit login
Unladen Swallow 2009Q3 Released with LLVM optimized Python (code.google.com)
127 points by yarapavan on Oct 21, 2009 | hide | past | favorite | 35 comments



Firstly let me say I think this is a wonderful project and one of the most important Python projects I know.

Something I noticed though, in the Project Plan page:

http://code.google.com/p/unladen-swallow/wiki/ProjectPlan#20...

They state that one of the goals for "2009Q3 and beyond" is "to remove the GIL and fix the state of multithreading in Python." Which I think is an awesome and ambitious goal that I would find very useful as will many others.

So I hoped the GIL thing would fall into the "2009Q3", but apparently it's in the "beyond" category. Anyway, they need to update that project plan.


My big python app (the robot end of a telepresence connection) suffers from GIL problems. Mostly, threads are used for C-only stuff like video compression so they can easily release the GIL while running. But I gotta say, the complexities of GIL correctness, thread performance, gtk/gobject, callbacks, boost.python, GC-correctness of both Python and C++ objects, and polling make my head hurt. Simplicity would be a find goal for the new system.


Is there a good python message passing implementation? Some problems fit message passing more naturally than threading.


http://docs.python.org/library/multiprocessing.html#module-m...

It has a Queue interface for passing objects, but also allows shared state.


If you're in the robotics space, you should seriously consider the Robot Operating System: http://www.ROS.org

You'll find their publish-subscribe message passing is quite simplistic, and the support is great.


If you head over to python-ideas there's actually some really good discussion about removing the GIL right now. Someone has proposed using CAS operations (which I know nothing about) to remove the fine grained locking that is needed under the refcounting GC without the GIL. In addition Antoine Pitrou has been working on a patch to make the GIL more performant on POSIX systems.


jfyi: CAS is the abbreviation of Compare-And-Swap, which are dedicated native machine instructions that offer atomicity and therefore simplify locking.


Hmm. It doesn't really "simplify" locking. In does in the simple case of:

  lock();
  ++value;
  unlock();
Where a lock is used to protect a single memory location, and the critical section is straight-line code. But when the critical section is more involved, CAS based algorithms become considerably more complicated. CAS based algorithms which do not require mutual exclusion are called "lock-free algorithms," and reasoning about them is hard.

The wiki for it is a good start: http://en.wikipedia.org/wiki/Non-blocking_synchronization And I'm going to go ahead and plug the work I've done in the area: http://people.cs.vt.edu/~scschnei/streamflow/


hi, obviously you have more experience here--I have just provided details straight from memory and did not want to disseminate misinformation and apologize if you or anyone else got that impression.

Many of the lock-free algorithms use CAS native machine instructions, however. What exactly do you mean by "reasoning about them is hard"? Do you mean in formal methods sense, or just difficult to grasp in general? I am just glancing over your ISMM'06 paper which looks very interesting--it seems you are using CAS instructions there, too (which makes me wonder that I was probably too unspecific in my "simplifying locking", which at least for my sloppy thinking includes "lock-free" algorithms too, sorry for that...)


Don't know if you'll see this, but:

Reasoning about lock-free algorithms is hard because there is no mutual exclusion. Locks enforce a critical section; you guarantee that only a single thread will execute in a critical section at any given time. This allows you to change state local to that critical section without worrying about other threads interfering.

Lock-free algorithms have no critical sections. All threads can execute all instructions at all times. This means that when you update shared state, you have to think about the implications of what happens when another thread touches the same state this thread is trying to update. Keeping this level of concurrency in your head at all times is difficult, which means that reasoning about the correctness of an algorithm is difficult.


Thanks, I was wondering what Column Address Strobe had to do with interpreters


Funny, I was wondering how a Computer Algebra System could help.


They mentioned this at the recent SF Python Meetup. They had found an academic paper that they thought they could apply to Python to get rid of the GIL, but it turned out someone had already tried it unsuccessfully. I think that goal of unladen-swallow is definitely "beyond" for now.


"2009Q3 uses up to 930% less memory than the 2009Q2 release."

Does that math make sense in some way I'm not seeing?


From their benchmarks rietveld: $ ./perf.py -r -b rietveld --args "-j always," --track_memory ../q2/python ../q3/python Mem max: 575116.000 -> 55952.000: 927.87% smaller

This 930% basically means that 2009Q3 can use as little as ~1/10 of the memory of 2009Q2.


"Uses 1/10 the memory" would have been much easier to parse.


For a human, maybe. We're programmers.


It's unconventional, but 9.3x less memory can be seen in some sense as 1/9.3=10.7 percent as much memory. Not defending it, just trying to interpret :P


Actually, it's 10.3x less memory (they've subtracted 1), so it's using 9.7% as much memory as the previous release. I would have said that 90.3% less than before, but maybe that's just me


You can't have 9.3x less memory, either. You can have (1/9.3)x as much memory, or they could just say 10% as much or 90% less.


I agree that it is technically unsound, but the meaning of "Nx less of something" is still relatively unambiguous when N>1.


Yes, it’s stupid (ambiguous, imprecise) terminology, but pretty common usage when describing computer benchmark results.


I never checked but I assumed it meant that the memory savings of Q3 compared to the savings of Q2 increased 930%


There's no "savings" vs. CPython: Memory usage is still 2-3x that of Python 2.6.1.

(Edit: Or did you mean that the 2009Q2->Q3 savings was 930% greater than the Q1->Q2 savings?)


Right, they're comparing against themselves.


Is it noticably faster than regular Python yet?


Their first release (2009Q1) consisted mostly of patches to mainlines CPython, which improved performance for everybody by roughly 10%. They've added about 10-15% performance on each of the last 2 releases in Unladen Swallow-specific code, but it's unclear (to me) how CPython performance has evolved over the last 6 months. They haven't compared explicitly since 2009Q1, which may or may not be revealing.

From the release notes:

Lowlights:

    * LLVM's JIT and other infrastructure needed more work than was expected. As a result, we did not have time to improve performance as much as we would have liked.
    * Memory usage is still 2-3x that of Python 2.6.1. However, there is more overhead that can be eliminated for the 2009Q4 release. 
The last 2 releases have been mostly building up infrastructure; hopefully most of that is behind them and they can work on getting the 5x performance they're after.


Do you know whether those Q1 changes have actually been merged into CPython? Will they be coming in 2.7?


Yes, they have.


I don't see mention of them in the changelog: http://docs.python.org/dev/whatsnew/2.7.html


Not all of them have been backported, but basically anything that says it's been contributed by Collin Winter or Jeffrey Yaskinn is something that's been backported.


Has anyone got this working on Windows?


I am considering giving it a try under Cygwin. If you get there first, please, tell us.


I did try under cygwin, and failed with errors in llvm. Will let you know if there's any more progress though.


What's the plan for Unladen Swallow to support py3k?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: