There are much more active projects that do the same thing as this.
- IPython parallel (https://ipyparallel.readthedocs.io/en/latest/ ) is pretty much the same idea but under active development, supports both local processes, small clusters, cloud computng, an HPC environments (SGE style supercomputing schedulers).
By my estimate, this projects seems to be old / stalled, not in active development. This notion is based on the lack of Python 3 support and the fact that majority of the activity in the forums happened in years 2006 - 2010.
EDIT: I was wrong, there is some activity in a different branch of the forum [1]. Last release almost two years ago.
Most fellow python people know this already, but native compiled C/C++ code generally runs about 100x faster than interpreted CPython code for any kind of number crunching or computation. So like sonium also mentioned, definitely consider the C route also if you're at all familiar with it and you're having performance problems. Threading python wouldn't generally get you as far (perhaps 4-6x speedup on an 8 core machine). Python's C integration is very well thought out and lets you use the best of both languages, as well as gain a deep understanding of what's going on under the hood in your python programs.
Forgive my ignorance, but how are you getting 4-6x speedup with threading in the face of the GIL? Asking because I generally choose C/C++ when performance matters over Python because of the GIL (and somewhat, the GC).
Apparently the PP module here overcomes the GIL limitations, so I'm assuming you'd see a typical multithreading perf increase with it. Haven't tested it myself though.
I also liked this book http://www.parallel-algorithms-book.com/ (free draft copy) it identifies which algorithms are best candidates for running in parallel/divide and conquer strategy, like the analysis on MergeSort vs Quicksort.
I'm a daily user of Python but the parallelization efforts really make me wonder.
When things in Python are slow, its most likely something with native Python code (in contrast to e.g. Numpy).
Wouldn't it then make more sense to rewrite this in native C code which gives a speedup of two orders of magnitude instead of getting into the mess of paralellization which scales, in the best case, with number of cores?
C can be a very difficult beast to tame. Using parallel python interpreters may actually be easier for some than giving up on OOP, garbage collection, platform independence, maintenance of a whole other language stack and bugs you just won't encounter with Python.
Not saying it's a bad idea, but it's a totally different approach.
Because in April 2014, python 2.7 was effectively declared to be an LTS release with support until 2020. No LTS release has been declared for python 3.x
This comes up over and over again in the Python community.
If you will allow me to paint with broad strokes, as I'm going to lump a lot of things together for the sake of ease of understanding, I am not speaking for every python user(s) out there.
There seems to be a huge tug of war between companies/corporate users and the PSF & more independent users.
The smaller (if you will, these are going to be big in the python community) 'shops' and library developers, and users all are or have migrated to Python 3 ages ago. Django now is going to be python 3 only in the foreseeable future (I know they EOL'd their python2 compatible libraries). The Software foundation itself wants it to be essentially toast by 2020. Things like Pyramid, Flask, Bottle, and even numpy & scipy have started moving in a direction of having python3 be their 'first' update (I think in the case of Flask and Bottle, they're end of life support for python2 is happening as well, though they haven't announced anything official, they seem to be adding new features and promising features that will only make sense in python3).
Yet, big 'corporate' users like Google (Look at you, grumpy https://github.com/google/grumpy) and Apple (still shipping with 2.7...and why!) and large universities all seem to be stuck on python 2.
Because of this, this to me really hurts the community and its holding back full development of python3 and python generally going forward. Why are companies holding back the state of a language (and then complaining about it) instead of diverting their ya know, massive resources, to get their own libraries and technologies up to current stack?
Why do so many people want to stay on python2. I'll never understand.
For what its worth, the latest version of python3 (3.6, well, 3.6.1, but the release notes for this is for 3.6) haves a ton of forward thinking built in concurrency improvements and technologies that these companies could easily take advantage of!
Then why hold everyone else back if you aren't willing to push forward? At least with new projects it'd be great if people used Python 3.
Google's choice with Grumpy particularly astounds me. Why 2.7? Its not clear to me and in fact, its really unclear to me, why that was a 'better' choice.
> Google's choice with Grumpy particularly astounds me. Why 2.7?
Probably because Google has a lot of legact Python 2.7, and doesn't do new greenfield development in Python 3, preferring other languages (particularly Go), so...
Projects made to scratch the maker's own itch are good in that they get thoroughly dogfooded, but they also can carry a focus that reflects the creator's needs more than anyone else's.
I guess this is supposed to cure the GIL curse of threading in Python? i.e. the fact that only one thread can be running in a n interpreter process.
There is still the multiprocessing module which can spawn processes, so you can effectively run code in parallel. It's pain to manage though.
I think in the end I am grateful for no parallel threading in Python as it forces me to either do things so that they can be run in naive-parallel, or to use things that have the concurrency solved out of Python.
A little hacky, yes, but extremely effective. I wrote an image processing application for somewhat large time-series datasets (> 1TB) on Linux which took liberal advantage of these details to run very nicely on v2 Xeon processors. It also worked quite well for GUIs which interacted with the datasets.
Sometimes worth it, but often not. The ability to do shared-memory multi-threading is one of the things that tempts me away from Python. Message passing is great and all, but sometimes you want your messages to be passing around control of a shared 4GB data structure, instead of trying to copy it.
Wouldn't it make a lot of sense to just use Pyspark with RDDs? Latency would be relatively high, but it'd also bypass the GIL while also being more modern.
In my experience pyspark is much more flaky and annoying that doing parallel computing with more 'python native' tools. It only really makes sense when you outgrown small clusters and really need huge infrastructure.
It seems to be Python's "multiprogramming" module with some extra features.
Python's parallelism options are "share everything (but with only one thread computing at a time)", or "share nothing" in a different process on the same machine. If you're parallelizing to get extra crunch power, neither is efficient.
The same story as all the dynamic jvm languages; great so long as all your dependencies are pure Python, interop with native code ie numpy is nonexistent. JRuby was making some headway here but I haven't checked in a while.
- IPython parallel (https://ipyparallel.readthedocs.io/en/latest/ ) is pretty much the same idea but under active development, supports both local processes, small clusters, cloud computng, an HPC environments (SGE style supercomputing schedulers).
- Joblib is a great tool for common embarrassingly parallel problems - run on many cores with a one liner (https://pythonhosted.org/joblib/index.html)
- Dask provides a graph-based approach to lasily building up task graphs and finding an optimal way to run them in parallel (http://dask.pydata.org/)