They state that one of the goals for "2009Q3 and beyond" is "to remove the GIL and fix the state of multithreading in Python." Which I think is an awesome and ambitious goal that I would find very useful as will many others.
So I hoped the GIL thing would fall into the "2009Q3", but apparently it's in the "beyond" category. Anyway, they need to update that project plan.
My big python app (the robot end of a telepresence connection) suffers from GIL problems. Mostly, threads are used for C-only stuff like video compression so they can easily release the GIL while running. But I gotta say, the complexities of GIL correctness, thread performance, gtk/gobject, callbacks, boost.python, GC-correctness of both Python and C++ objects, and polling make my head hurt. Simplicity would be a find goal for the new system.
If you head over to python-ideas there's actually some really good discussion about removing the GIL right now. Someone has proposed using CAS operations (which I know nothing about) to remove the fine grained locking that is needed under the refcounting GC without the GIL. In addition Antoine Pitrou has been working on a patch to make the GIL more performant on POSIX systems.
jfyi: CAS is the abbreviation of Compare-And-Swap, which are dedicated native machine instructions that offer atomicity and therefore simplify locking.
Hmm. It doesn't really "simplify" locking. In does in the simple case of:
lock();
++value;
unlock();
Where a lock is used to protect a single memory location, and the critical section is straight-line code. But when the critical section is more involved, CAS based algorithms become considerably more complicated. CAS based algorithms which do not require mutual exclusion are called "lock-free algorithms," and reasoning about them is hard.
hi, obviously you have more experience here--I have just provided details straight from memory and did not want to disseminate misinformation and apologize if you or anyone else got that impression.
Many of the lock-free algorithms use CAS native machine instructions, however. What exactly do you mean by "reasoning about them is hard"? Do you mean in formal methods sense, or just difficult to grasp in general? I am just glancing over your ISMM'06 paper which looks very interesting--it seems you are using CAS instructions there, too (which makes me wonder that I was probably too unspecific in my "simplifying locking", which at least for my sloppy thinking includes "lock-free" algorithms too, sorry for that...)
Reasoning about lock-free algorithms is hard because there is no mutual exclusion. Locks enforce a critical section; you guarantee that only a single thread will execute in a critical section at any given time. This allows you to change state local to that critical section without worrying about other threads interfering.
Lock-free algorithms have no critical sections. All threads can execute all instructions at all times. This means that when you update shared state, you have to think about the implications of what happens when another thread touches the same state this thread is trying to update. Keeping this level of concurrency in your head at all times is difficult, which means that reasoning about the correctness of an algorithm is difficult.
They mentioned this at the recent SF Python Meetup. They had found an academic paper that they thought they could apply to Python to get rid of the GIL, but it turned out someone had already tried it unsuccessfully. I think that goal of unladen-swallow is definitely "beyond" for now.
It's unconventional, but 9.3x less memory can be seen in some sense as 1/9.3=10.7 percent as much memory. Not defending it, just trying to interpret :P
Actually, it's 10.3x less memory (they've subtracted 1), so it's using 9.7% as much memory as the previous release. I would have said that 90.3% less than before, but maybe that's just me
Their first release (2009Q1) consisted mostly of patches to mainlines CPython, which improved performance for everybody by roughly 10%. They've added about 10-15% performance on each of the last 2 releases in Unladen Swallow-specific code, but it's unclear (to me) how CPython performance has evolved over the last 6 months. They haven't compared explicitly since 2009Q1, which may or may not be revealing.
From the release notes:
Lowlights:
* LLVM's JIT and other infrastructure needed more work than was expected. As a result, we did not have time to improve performance as much as we would have liked.
* Memory usage is still 2-3x that of Python 2.6.1. However, there is more overhead that can be eliminated for the 2009Q4 release.
The last 2 releases have been mostly building up infrastructure; hopefully most of that is behind them and they can work on getting the 5x performance they're after.
Not all of them have been backported, but basically anything that says it's been contributed by Collin Winter or Jeffrey Yaskinn is something that's been backported.
Something I noticed though, in the Project Plan page:
http://code.google.com/p/unladen-swallow/wiki/ProjectPlan#20...
They state that one of the goals for "2009Q3 and beyond" is "to remove the GIL and fix the state of multithreading in Python." Which I think is an awesome and ambitious goal that I would find very useful as will many others.
So I hoped the GIL thing would fall into the "2009Q3", but apparently it's in the "beyond" category. Anyway, they need to update that project plan.