Parallel Python

mangecoeur · on March 14, 2017

There are much more active projects that do the same thing as this.

- IPython parallel (https://ipyparallel.readthedocs.io/en/latest/ ) is pretty much the same idea but under active development, supports both local processes, small clusters, cloud computng, an HPC environments (SGE style supercomputing schedulers).

- Joblib is a great tool for common embarrassingly parallel problems - run on many cores with a one liner (https://pythonhosted.org/joblib/index.html)

- Dask provides a graph-based approach to lasily building up task graphs and finding an optimal way to run them in parallel (http://dask.pydata.org/)

gshulegaard · on March 14, 2017

Slightly dated now, but a good walk through of the "steps" of parallel computing in Python:

https://www.youtube.com/watch?v=gVBLF0ohcrE

imcoconut · on March 14, 2017

dask has been a real pleasure to work with.

lambdasue · on March 14, 2017

By my estimate, this projects seems to be old / stalled, not in active development. This notion is based on the lack of Python 3 support and the fact that majority of the activity in the forums happened in years 2006 - 2010.

EDIT: I was wrong, there is some activity in a different branch of the forum [1]. Last release almost two years ago.

[1] http://www.parallelpython.com/component/option,com_smf/Itemi...

lima · on March 14, 2017

Python 3 support is a great indicator for that sort of thing.

When I ported my stuff, I discovered that some of the libraries I used were unmaintained and I replaced them by a newer library.

mkesper · on March 14, 2017

From the FAQ:

Q) What Python versions are supported?

A) PP was tested with Python 2.3 - 2.7 on a variety of platforms.

gigatexal · on March 14, 2017

Then that's a hard pass for me.

QuadrupleA · on March 14, 2017

Most fellow python people know this already, but native compiled C/C++ code generally runs about 100x faster than interpreted CPython code for any kind of number crunching or computation. So like sonium also mentioned, definitely consider the C route also if you're at all familiar with it and you're having performance problems. Threading python wouldn't generally get you as far (perhaps 4-6x speedup on an 8 core machine). Python's C integration is very well thought out and lets you use the best of both languages, as well as gain a deep understanding of what's going on under the hood in your python programs.

piker · on March 14, 2017

Forgive my ignorance, but how are you getting 4-6x speedup with threading in the face of the GIL? Asking because I generally choose C/C++ when performance matters over Python because of the GIL (and somewhat, the GC).

QuadrupleA · on March 14, 2017

Apparently the PP module here overcomes the GIL limitations, so I'm assuming you'd see a typical multithreading perf increase with it. Haven't tested it myself though.

brianwawok · on March 14, 2017

Well you can multiproccess. 8 core machine? Run 8 pythons with 8 Gils. What web servers have done forever.

piker · on March 14, 2017

True, but OP said threading. Threading and multiprocessing imply significantly different engineering challenges.

hackermailman · on March 14, 2017

This is a good intro to Parallel Computing in C++ if anybody is interested http://www.cs.cmu.edu/~15210/pasl.html

I also liked this book http://www.parallel-algorithms-book.com/ (free draft copy) it identifies which algorithms are best candidates for running in parallel/divide and conquer strategy, like the analysis on MergeSort vs Quicksort.

sonium · on March 14, 2017

I'm a daily user of Python but the parallelization efforts really make me wonder. When things in Python are slow, its most likely something with native Python code (in contrast to e.g. Numpy). Wouldn't it then make more sense to rewrite this in native C code which gives a speedup of two orders of magnitude instead of getting into the mess of paralellization which scales, in the best case, with number of cores?

Longhanks · on March 14, 2017

C can be a very difficult beast to tame. Using parallel python interpreters may actually be easier for some than giving up on OOP, garbage collection, platform independence, maintenance of a whole other language stack and bugs you just won't encounter with Python.

Not saying it's a bad idea, but it's a totally different approach.

Mayzie · on March 14, 2017

sigh Why do all of these Python reimplementation projects target only Python 2.x?

undergrowth54 · on March 14, 2017

Because in April 2014, python 2.7 was effectively declared to be an LTS release with support until 2020. No LTS release has been declared for python 3.x

fnj · on March 14, 2017

The other way of looking at it is that 2.7 is EOL in 2020, and there will be no further 2.x. 2.x is very much a dead end.

http://legacy.python.org/dev/peps/pep-0373/

BuuQu9hu · on March 14, 2017

The PyPy team will support Python 2 indefinitely: https://morepypy.blogspot.com/2016/08/pypy-gets-funding-from...

We never have to leave Python 2, if we do not want to.

no_wizard · on March 14, 2017

This comes up over and over again in the Python community.

If you will allow me to paint with broad strokes, as I'm going to lump a lot of things together for the sake of ease of understanding, I am not speaking for every python user(s) out there. There seems to be a huge tug of war between companies/corporate users and the PSF & more independent users.

The smaller (if you will, these are going to be big in the python community) 'shops' and library developers, and users all are or have migrated to Python 3 ages ago. Django now is going to be python 3 only in the foreseeable future (I know they EOL'd their python2 compatible libraries). The Software foundation itself wants it to be essentially toast by 2020. Things like Pyramid, Flask, Bottle, and even numpy & scipy have started moving in a direction of having python3 be their 'first' update (I think in the case of Flask and Bottle, they're end of life support for python2 is happening as well, though they haven't announced anything official, they seem to be adding new features and promising features that will only make sense in python3).

Yet, big 'corporate' users like Google (Look at you, grumpy https://github.com/google/grumpy) and Apple (still shipping with 2.7...and why!) and large universities all seem to be stuck on python 2.

Because of this, this to me really hurts the community and its holding back full development of python3 and python generally going forward. Why are companies holding back the state of a language (and then complaining about it) instead of diverting their ya know, massive resources, to get their own libraries and technologies up to current stack?

Why do so many people want to stay on python2. I'll never understand.

For what its worth, the latest version of python3 (3.6, well, 3.6.1, but the release notes for this is for 3.6) haves a ton of forward thinking built in concurrency improvements and technologies that these companies could easily take advantage of!

https://docs.python.org/3/library/concurrency.html https://docs.python.org/3/library/asyncio.html https://docs.python.org/3/library/multiprocessing.html

jjawssd · on March 14, 2017

Lack of ROI (return on investment) is why people stay on Python 2

no_wizard · on March 14, 2017

Then why hold everyone else back if you aren't willing to push forward? At least with new projects it'd be great if people used Python 3.

Google's choice with Grumpy particularly astounds me. Why 2.7? Its not clear to me and in fact, its really unclear to me, why that was a 'better' choice.

dragonwriter · on March 14, 2017

> Google's choice with Grumpy particularly astounds me. Why 2.7?

Probably because Google has a lot of legact Python 2.7, and doesn't do new greenfield development in Python 3, preferring other languages (particularly Go), so...

Projects made to scratch the maker's own itch are good in that they get thoroughly dogfooded, but they also can carry a focus that reflects the creator's needs more than anyone else's.

takeda · on March 14, 2017

Grumpy looks to me like it was created as a bridge to port everything to Go.

If you look at it from that perspective and know that Google has a lot of existing python 2 code base, it makes a lot of sense.

If they wanted to stick with python, instead of grumpy, they would instead provide a way to write python modules in go.

undergrowth54 · on March 14, 2017

Like COBOL

takeda · on March 14, 2017

In this case it doesn't look like this project is maintained anymore.

t0mk · on March 14, 2017

I guess this is supposed to cure the GIL curse of threading in Python? i.e. the fact that only one thread can be running in a n interpreter process.

There is still the multiprocessing module which can spawn processes, so you can effectively run code in parallel. It's pain to manage though.

I think in the end I am grateful for no parallel threading in Python as it forces me to either do things so that they can be run in naive-parallel, or to use things that have the concurrency solved out of Python.

Sean1708 · on March 14, 2017

This looks more like MPI style parallelism (usually used on things like supercomputers and clusters) than anything the standard library provides.

gnipgnip · on March 14, 2017

Sadly you can only share pickleable objects in python multiprocessing.

radarsat1 · on March 14, 2017

Or shared memory: http://stackoverflow.com/questions/5549190/is-shared-readonl...

(but it's a little hacky..)

sdbrown · on March 14, 2017

A little hacky, yes, but extremely effective. I wrote an image processing application for somewhat large time-series datasets (> 1TB) on Linux which took liberal advantage of these details to run very nicely on v2 Xeon processors. It also worked quite well for GUIs which interacted with the datasets.

cocoablazing · on March 14, 2017

Still, multiprocessing objects like Array and Manager are limited to ctypes.

sevensor · on March 14, 2017

Sometimes worth it, but often not. The ability to do shared-memory multi-threading is one of the things that tempts me away from Python. Message passing is great and all, but sometimes you want your messages to be passing around control of a shared 4GB data structure, instead of trying to copy it.

rev_d · on March 14, 2017

Wouldn't it make a lot of sense to just use Pyspark with RDDs? Latency would be relatively high, but it'd also bypass the GIL while also being more modern.

mangecoeur · on March 14, 2017

In my experience pyspark is much more flaky and annoying that doing parallel computing with more 'python native' tools. It only really makes sense when you outgrown small clusters and really need huge infrastructure.

splike · on March 14, 2017

What python tools do you use for small clusters?

elyase · on March 14, 2017

Dask would be an option.

mangecoeur · on March 15, 2017

Was going to say that. Or ipython parallel if you want to go lower level

Animats · on March 14, 2017

It seems to be Python's "multiprogramming" module with some extra features.

Python's parallelism options are "share everything (but with only one thread computing at a time)", or "share nothing" in a different process on the same machine. If you're parallelizing to get extra crunch power, neither is efficient.

dragonwriter · on March 14, 2017

The base multiprocessing modules isn't shared nothing (though it doesn't share the runtime, there are shared memory constructs that are available.)

It's better described as having only explicit sharing (as opposed to classic threading models, which have implicit sharing.)

gravypod · on March 14, 2017

I wish someone could port Julia's parallelization notation to python. That's the only Language that struck me as "wow".

ldev · on March 14, 2017

Expected redirect to golang.com

m_mueller · on March 14, 2017

Needs IPC - not interested due to comm. bottleneck. Anyone got experience with Jython? How drop-in is it?

marktangotango · on March 14, 2017

The same story as all the dynamic jvm languages; great so long as all your dependencies are pure Python, interop with native code ie numpy is nonexistent. JRuby was making some headway here but I haven't checked in a while.

dom0 · on March 14, 2017

If your application would be constrained by that, then it would not be written in an interpreted language in the first place.