This may work, this may not, but some nod to the fact that they've read things like http://queue.acm.org/detail.cfm?id=1454466 (as linked by codevine before me) and have some plan to address the problems would be nice. So far, attempts to bodge STM on to an imperative language that wasn't built with it in mind has not merely been failure, but radical failure. Immense amount of effort by very smart people, followed by the total unusability and unsalvagability of the resulting code.
The fact that only place it has been shown to work is basically Clojure and Haskell needs to be addressed; how are you going to address the apparent fundamental incompatibility between imperative languages and STM? And while you may be able to answer that at the VM level (very tiny transactions implemented between opcodes, basically), I would expect it is absolutely hopeless to allow users access to this. How are you going to address the endless-side-effect replay problem? (Hint: You aren't.)
This is a kill-PyPy level of dangerous decision, and I think everyone on that team ought to demand a very good answer to that question before buying into this.
Until we have a new conceptual breakthrough, multiversion concurrency control (MVCC) [1], which STM represents in this context, is the only solution they have.
Yes it will impose a tax on all mutable data structures, and push people towards a more functional style of programming. This is generally good. It may be a challenge for people who think solely in terms of OOP, but then OOP will evolve in response, and that is also good.
The way I read it, it's a PyPy-the-VM feature, used in RPython, but not a PyPy-the-interpreter feature callable in the Python code interpreted, hence it's not available to users (i.e Python devs)
From a PyPy as a practical tool point of view, I agree.
From a PyPy as a research project, I'm excited. I'm skeptical of STM as a synchronization solution as well, but I still want us to explore it. If their approach truly is novel, then maybe there's something there.
That's kind of what I mean, some evidence that they've got a novel approach would be nice, because so far the answer is basically "STM doesn't work in code with unrestricted side effects", and personally, I'm pretty satisfied with the amount of due diligence that has been done on that result. Research should build on existing results, not retry the same things over and over.
I think the difference here is that there is not, in fact, unrestricted side-effects. Because the STM transaction scope is decided by the interpreter, not by the programmer, the interpreter can ensure the current STM transaction is committed immediately before any operation with side-effects, and a new transaction started immediately after. This loses some of the power of STM, certainly, but PyPy's only aiming to maintain internal consistency of interpreter data structures at this point, so this is good enough for them.
My thinking was that most research with STM for imperative codes have used the library approach. Since they will have full control over generated code and how it interacts with the STM, they may have more opportunities to optimize.
We've been mashing out Python for the last few months at my job, and we are consistently struggling with the Python GIL performance penalty for threads.
If PyPy can break the GIL, I would anticipate that we'd seriously consider pushing our users toward it.
That's easy to say, but ignorant of history of other open source projects. A lot of open source projects have killed themselves by overcommitment to a bad idea. PyPy is just getting to where it might be considered a practical answer for some problems and if the core project wants to drop that property, committing to a path like this is a great way to do it.
Sure, if it's some guy's branch, hey, whatever, but I'm running on the assumption that this is being promoted not as SomeGuy'sBranch but as an argument about the way the official project should go, because otherwise, why proselytize when you can just do? An awful lot of projects have died this way.
The problem is, this won't die in week two. It'll only die after a lot of effort and either several stalled or buggy releases. It's not an "it just doesn't work, crashes the system" problem, it's an infinite regress problem.
This will be implemented as an interpreter transformation, the entire idea behind this is that you don't have to care about it, just as you usually don't have to care about the JIT or the GC.
As Armin mentions in the blog post, the STM can be explored at the transformation/translation level. It seems to me the level of commitment here is more "interesting branch that through trial and measurement may lead to a important language-level breakthrough" rather than "stone albatross that drowns the project."
The fact that only place it has been shown to work is basically Clojure and Haskell needs to be addressed; how are you going to address the apparent fundamental incompatibility between imperative languages and STM? And while you may be able to answer that at the VM level (very tiny transactions implemented between opcodes, basically), I would expect it is absolutely hopeless to allow users access to this. How are you going to address the endless-side-effect replay problem? (Hint: You aren't.)
This is a kill-PyPy level of dangerous decision, and I think everyone on that team ought to demand a very good answer to that question before buying into this.