Hacker News new | past | comments | ask | show | jobs | submit login
Parallel tasks in Python: concurrent.futures (vinta.ws)
177 points by gibuloto on April 2, 2018 | hide | past | favorite | 46 comments



One common misconception (or should I say, overgeneralization) is repeated in the article: threads are always unsuited to CPU intensive work.

For instance, most numpy operations release the GIL, meaning that you can perform heavy computation on multiple threads simultaneously. Certain other C extensions do the same, including some bits of the standard library. The usual caveats apply about threading bugs, of course.

Another detail is that numpy linked to e.g Intel MKL will multithread some operations by default. Running multiple threads-in-threads is likely to cause slowdown.


I only see the article mentioning CPU (and GIL) once, but in any case, the generalisation is correct for any pure Python code. You can only release the GIL in extension code (for CPython), and in that point you’re not really dealing with Python threads (although you may use Python’s wrapper API for threading), but its underlying implementation (e.g. pthreads) instead. The framing of the statement is very important, and I wouldn’t call it overgeneralising in the context of this particular article.


Only np.dot() has intrinsic multithreading. No other functions do. Bizzarely np.dot() is the fastest way to do things other than dot product (like copy or multiply) in some cases.


well, also np.linalg routines that call LAPACK may be multithreaded.


That's misleading--you say linalg routines "may be" multithreaded, but the vast majority of them never have been. matmul and einsum, despite being candidates for intrinsic multithreading, are not multithreaded. You can read discussion about that here: https://jackkamm.github.io/blog/a-parallel-einsum/


I'm sorry, isn't "objects belonging to class C may have property P" a fair way to say there are some members which have it and some which do not? I don't see how that's misleading. I was correcting your statement about np.dot being the only parallel fn


Can you name one function other than dot() and tensordot() which has intrinsic multithreading?



> posted in Jan. 2017

Now we have asyncio and awesome libraries like aiohttp[1] you can get a much, much higher throughput than you'd ever achieve with threads with less code.

1. http://aiohttp.readthedocs.io/


I found this article extremely persuasive, and it matches my own experiences with asyncio: The performance gain might be there if most of what you do is waiting for a network response, but even a small amount of data processing will make your program CPU bound pretty quickly.

http://techspot.zzzeek.org/2015/02/15/asynchronous-python-an...


Note that it's not either/or - you can dispatch work from an event loop to a thread pool (or a process pool) with loop.run_in_executor [0], while loop.call_soon_threadsafe [1] can be used by worker threads to add callbacks to the event loop.

This means that the "frontend" of a service can be asyncio, allowing it to support features like WebSockets that are non-trivial to support without aiohttp or a similiar asyncio-native HTTP server [2], while the "backend" of the service can be multi-threaded or multi-process for CPU-bound work.

0: https://docs.python.org/3/library/asyncio-eventloop.html#exe...

1: https://docs.python.org/3/library/asyncio-eventloop.html#asy...

2: Flask-SocketIO, for example, requires that you use eventlet or gevent, which are the "legacy" ways of doing asynchronous IO: https://flask-socketio.readthedocs.io/en/latest/


> "even a small amount of data processing will make your program CPU bound pretty quickly"

I don't know what "a small amount" means. You are bound by hardware (cores, hyper-threading). It simply makes no sense to spawn 32 threads executing computation intensive code (no IO), on a machine with 2 cores.


Plus asyncio has futures too, and with run_in_executor(), you can await something in a thread/multiprocessing pool from inside the event loop transparently.


Yes, I use this extensively in a web crawler -- network I/O is done in a main thread that's cpu bound doing networking stuff, and cpu-burning tasks like html parsing are done in child threads done by run_in_executor. Without trying very hard I can get to a load of 3-4 during ordinary web crawling.


You can get the best of both worlds with PyParallel! Async I/O and multiple threads. (Experimental project, not intended for production use, so I say this some what facetiously.)

http://pyparallel.org/


Last commit Oct 2016. Not sure where this is going.


Will this finally let me write a parallel Python script that doesn't explode when I press control+C?


That's always been easy enough, just a little hidden:

    import signal
    signal.signal(signal.SIGINT, signal.SIG_DFL)
Now it will just die on Ctrl-C. For text filter programs it's a good idea to do the same for SIGPIPE too.


Depends. What do you want to happen when you press ctrl+C?


I want it to not hang forever, not spam the console with pages of exception tracebacks, and not require a ton of boilerplate process-management code to accomplish the above. Ideally it would also allow me to handle other exceptions (i.e. besides KeyboardInterrupt) that occur both in the main process and child processes. I've never figured out how to do this with multiprocessing, despite lots of attempts.


Don't know what you did wrong, but you should be able to catch exceptions at their appropriate level and shut down gracefully.


multiprocessing was designed to mirror the threading library... so the exceptions are specifically set up not to cross thread/process boundaries. But the same strategies to handle multithreaded exceptions and keyboardinterrupt should apply.

If you want the child processes to shutdown when the main process goes down, you should be able to just set .daemon = True on the process objects before you start them.

If you want exceptions in the children thread to propagate up to the main thread and then handle, looks like you'd just need to send the exception across a queue or something in multiproccessing. In the new futures library, your future (wrapping the process) has a return value property you can try to access. If the child process ran into an exception, that exception would be raised in the main process when you tried to access that return value.


I know it should be possible, but I've never figured out how to do it, nor have I ever seen any example code using multiprocessing that does it. If you know of some example code that properly catches and handles exceptions in any process, I'd love to see it.


The concurrent.futures module handles exceptions well, waiting to raise them until you call ``future.result``.

Or did I misunderstand the problem?


My complaints are about the multiprocessing module, and my hope is that concurrent.futures will solve some of them.


The latter library has a better design in some respects, but isn't quite as pleasant as ``Pool.imap_unordered`` when you're not worried about exceptions in the subprocesses/threads. However, I've been using ``concurrent.futures`` more in the last year.


Isn't this what the `with` keyword was made for:

    with ThreadPool() as pool:
        fut = pool.submit(foo)
        print(fut.result())
My idea is that ctrl+c would now cancel child operations cleanly?


I used the Python 2 backport of concurrent.futures for a project recently (parallelizing calls to an external API) and it worked fantastically well. It's a really nice model for doing concurrent outbound I/O in a bunch of threads.


I tried using threads on multiple pipe reading, to centralise a logfile sorting problem (each discrete logfile is a gz which itself is only partially in order, and then between files a merge-sort has to be performed) It was enjoyable to try to fix, but ultimately I found the solution only marginally better than explicit processes feeding a single reader doing round-robin. I think the lesson I learned is that if the problem integrates back into a single context there isn't much you can do to avoid that bottleneck once all the other parallelism opportunities have been overcome.


What's the advantage of using a ProcessPoolExecutor over just using multiprocessing? Is it that there's a single interface that you can use for both threads and processes?


I don't really think there's an advantage. Just like how multiprocessing tries to mirror the threading interface, ProcessPoolExecutor just mirrors the threaded implementation of the new futures based concurrency interface.

I think futures are nicer for certain types of interactions. For example, futures 'return' actual values, so its nice for dispatching a task that you'll get a result from back. Futures also raise exceptions (when you try to inspect their results, if an exception occurred in the task). This might make for cleaner error handling code.


multiprocessing also has a single interface for both threads and processes, so it's not that.


I'm using concurrent.futures in production and its use of the multiprocessing module caused the Python grpc library to break in a really strange and hard-to-debug way:

https://github.com/grpc/grpc/issues/13873

I suspect it's not the only Python library that will see issues if you are running it in the Future context.


Cool, just wrote my first code with this module a week ago. A client needed to run background tasks under Flask without the ops complexity or dev time needed to set up a job queue. https://stackoverflow.com/a/39008301/450917


I wrote a similar interface to run asyncio compatible ProcessPool/ThreadPool executors, a couple of days ago:

https://github.com/feluxe/aioexec


Sad to see that Python still suffers from the Global Interpreter Lock (GIL), and that the only way out is still to use multiple processes (which causes problems of its own, e.g. sharing of large data structures becomes expensive).


Only for computationally expensive operations done in interpreted Python.

C extensions, IO operations etc. always release the lock. In practice GIL is a problem only when it is profiled to be a problem.

Python is used a lot in the data analysis world and nobody cares about the lock, because a fraction of the CPU time is spent within the lock.


> Only for computationally expensive operations done in interpreted Python. C extensions, IO operations etc. always release the lock.

So you are saying it is a problem in Python but not in other languages? Which is exactly my point :)


I found a concurrent.futures.ThreadPoolExecutor useful for database seeding, where I invoke a whole lot of sql alchemy core inserts


Does anyone have a recommendation for what to use for a cache shared between processes? Would hdf5 work?


If you want a file-based cache, yes.


so how does this compare to something like Deco? https://github.com/alex-sherman/deco. I guess since this uses a single GIL its good for IO limited things?


As written, the code in the blogpost is good for IO limited things. But as it notes, if you replace 'ThreadPoolExecutor' with 'ProcessPoolExecutor' then you get actual multiprocessing, and you may be able to get speedup on compute bound tasks.

The linked repo looks like some nice wrappers/decorators around the 'old' multiprocessing library to make it really easy to parallelize a bunch of function calls within a blocking function.


The last for loop of the last code example doesn't need to be under the with statement.


Would this allow separate GC'ng for each task?


Only ProcessPoolExecutor[1]. If you use a thread pool or async io it will be single python process/GIL.

[1] -https://docs.python.org/3/library/concurrent.futures.html#pr...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: