Working simultaneously vs. waiting simultaneously

dkersten · on May 2, 2014

If you want performance, it's kinda funny to use pure Python and then mourn the fact that you can't run, on 8 cores, Python code that's 30-50x slower than C to begin with.

I really dislike this attitude. If Python is much easier, more pleasant and safer to write than C, then being able to get a ~8x (or even 4x) speedup by running on an 8 core machine is a perfectly reasonable thing to want to do.

yosefk · on May 3, 2014

Well, let's say you can't use numpy or similar - "C written by someone else" - and further, let's imagine a world where CPython has no GIL. Then there are 2 options:

1. You rewrite things in single-threaded C, getting a speed-up of 30-50x.

2. You rewrite things in multi-threaded Python, getting a speed-up of say 4-8x.

I argue that 1 is often actually safer than 2, because parallelism-related bugs are harder to hunt down than C's memory-overwriting bugs (this can be mitigated with good APIs and a good race detector working against these APIs - as in Cilk or https://github.com/yosefk/checkedthreads - but it still requires good testing coverage).

Now if 2 were more pleasant and safe than 1, I might agree with you even though 1 gives the better speed-up. But I believe 1 is faster and safer.

xarball · on May 3, 2014

Well, why not both?

-Python has a lot of very good and relatively simple APIs that are easy to get started with.

-Python allows you to write with little concern for memory management

-Python allows for (nearly) type-free interaction.

-C allows direct memory management.

-C allows for strong typed variables for reflection-less execution (though 'auto' was just added to gcc!)

-C allows you to interact with kernel-level APIs almost directly.

I don't see how those same things couldn't in theory be achievable in a newer spinoff of C/C++, with twists on the rules of the environment that you yourself are comfortable writing in. Realistically there's no reason to neglect what either offers -- we should strive to make them achievable in the most comforting and adaptable grammar.

Maybe one that allows you to set your own rules when you need them, to enforce them, and to let you enforce configure those safeties in modules or components that need them, at compile-time!

We're free to change all of this, you know!

dkersten · on May 3, 2014

It really depends on the problem though. There are plenty of problems which are reasonably easy to get a parallelism speedup (even if not close to the ideal speedup) by only rewriting or restructuring small parts of the program, but where rewriting in C would be a hell of a lot of complex work.

A lot of people write terrible C code with a plethora of hard to find memory bugs. A lot of other people don't even know C at all. The GIL means that people can't even try[1] to get some speedup from Python without having to reach for C.

[1] I know, I know, Python lets you use multiple processes for parallelism. I also dislike the attitude that this is a an adequate enough solution that thread-based parallelism sn't needed, but that's a story for another day.

sampo · on May 3, 2014

> "A lot of people write terrible C code with a plethora of hard to find memory bugs"

Wee it wouldn't need to be C. Also Ada, D, Fortran, Go, Java (Scala?), Lisp, OCaml, (Rust), or some such, would give speed close to that of C.

dkersten · on May 4, 2014

Not all of those have easy interop with Python though (since we're talking about Python here) and it requires people to know one of those languages.

ramchip · on May 3, 2014

Quite a few python programmers I know are physicists or other non-professional programmers who really don't know how to write C, use gdb, debug linking problems, etc. For them I think staying in pure python is really a large plus, even if it means reading a bit about locks and threads.

sampo · on May 3, 2014

For numerical code, Fortran would be both fast, and a relatively easy language to learn.

justinmk · on May 2, 2014

Rob Pike's "Concurrency is not parallelism"[1] slides enlightened me to the distinction being made here. I think it's valuable and over-due for the industry[2] to define the boundaries of these concepts (which are often regarded as amorphous or synonymous).

[1] http://blog.golang.org/concurrency-is-not-parallelism

[2] or am I conflating "industry" with "me"?

mkohlmyr · on May 3, 2014

I would guess most programmers understand the distinction between the two. However understanding the difference in theory doesn't always mean one applies successfully it in practice. E.g. if you were unaware of the CPython GIL you might think threading implies parallelism.

toolslive · on May 2, 2014

not just you. I've seen plenty of 'architects' that don't know the difference.

noonat · on May 2, 2014

> Indeed you can hardly do many things at once - not in a single pure Python process. One thread doing something takes the GIL and the other thread waits.

> You can however wait for many things at once just fine - for example, using the multiprocessing module (pool.map), or you could spawn your own thread pool to do the same.

This paragraph is a bit deceiving. The multiprocessing module does spawn subprocesses by default, so it can indeed be used to workaround a GIL issue and do many things at once. It's not the same as a thread pool (although the multiprocessing module does offer a compatible interface which uses threads).

justinmk · on May 2, 2014

> spawn subprocesses by default, so it can indeed be used to workaround a GIL issue and do many things at once

But that defeats the purpose of the distinction, because when you wait on a web request, I/O, etc, a machine or peripheral is also doing something on the other end.

cmbaus · on May 2, 2014

This is a really good way to put this distinction. Most applications use threads because they are waiting on I/O, not because they are concurrently processing.

rdtsc · on May 2, 2014

I think this is rather confusing and it introduces un-necessary abstractions -- "working" and "waiting".

It is easier to think of it as:

* concurrency -- a property of the algorithm or the business process where multiple things can be can happen at the same time. Think about mergesort for example, you can sort the left and the right parts of the list at the same time. There is no guarantee that when the program executes concurrent things will actually execute at the same time. Or think of fetching web pages from multiple websites with no need to wait between them.

* parallelism -- a property of the execution environment. Based on the specific hardware, and OS facilities, some of the concurrent things can actually happen at the same time. Can fetch the web pages from 100 sites at the same time. Will actually sort both sides of the array on two separate CPU cores. And so on.

It is as simple as that. There is some fuzziness around the two. Like saying can one have parallelism without concurrency. Maybe your compiler noticed how it could transform your for loop into a SIMD operation. Ok, is that concurrency and parallelism or is it just parallelism. I don't really bother being that pedantic at that point. Is it concurrency only if user specifically uses provided language concurrency primitives or does "implicit" (let compiler figure it out) concurrency also count kind of stuff.

But, wait, that's not all. There is another axis of comparison that is useful in practice, and that is CPU vs I/O.

* CPU : Mergesort example above would be an example of mostly CPU concurrency and parallelism. You can spawn two threads to make it explicit and if you have more than one core, you might get some CPU parallelism.

* I/O : You an fetch 2 pages at once using a thread per page and your socket layer works properly you might get good parallelism.

I/O is where the discussion of the GIL comes in. Python has great I/O parallelism. GIL doesn't really matter much for it. I have written a spyder like I described and it was very performant, very nice, just using plain on GIL-ified threads.

Anyway it is important to keep these concepts clear to be aware of them. These are important today and I feel they are not taught well enough in academia (not in this explicit, comparative fashion).

derefr · on May 3, 2014

An interesting property of CPU vs. I/O concurrency, to me, is that while I/O-heavy concurrency works fine with cooperative or reduction-counting scheduling, CPU-heavy concurrency basically requires a preemptive scheduling approach (i.e. multiple native threads.)

A lot of people would assume that on a single-core machine, you could get away with running, for example, Erlang with only a single native scheduler-thread for its tasks. This would be true until the first time you called a C-FFI function that took a full CPU-second to return. Then all the other tasks in your system would wake up wondering what year it is and how their beards grew so long. Erlang added "dirty schedulers" (basically, extra scheduler-threads that will get spawned above-and-beyond the default one-per-core-with-CPU-affinity set) precisely so the preemption of CPU-heavy tasks could be pushed off into the OS's capable hands.

rdtsc · on May 3, 2014

> This would be true until the first time you called a C-FFI function that took a full CPU-second to return.

As you said, up until last release that would be true in general. Even if you had 16 CPUs running C NIFs inside of it that block for longer than 1ms at a time would probably be a bad idea.

Word of warning, think really well about compiling in and sticking native code (NIF) drivers in the middle of the Erlang VM. You lose reliability, fault tolerance, predictable low latency response. Good examples of NIFs could be computing a hash function, or something similar that would be constant time and easily tested. Not something like getting a piece of data from a database.

derefr · on May 3, 2014

I think the best case for dirty schedulers in Erlang isn't for running CPU-intensive tasks within the same VM that's doing your IO, though; it's for running a separate Erlang node, on a separate machine, and sending it CPU-heavy work to do over the distribution protocol. Effectively, an isolated Erlang node + dirty schedulers + NIFs is a souped-up, easier-to-code-for C node.

yosefk · on May 3, 2014

Well, "working vs waiting" is the same as "CPU vs I/O", so we're saying the same thing here.

As to your "concurrency vs parallelism" distinction - it's one valid one, and pretty widespread; I think it misses a lot of important points though, and I wrote about it here: http://www.yosefk.com/blog/parallelism-and-concurrency-need-...

One point from there is, a parallel mergesort is conflict-free while say a concurrent DB-backed web app has a lot of opportunities for conflicts (2 users want to take the same username - who wins? etc. etc.) I argue that the many differences between such systems result in rather different tools being suitable to build them.

_3u10 · on May 2, 2014

ZeroMQ implements this pattern really well.

It's just single threaded code... all the threading happens in the background...

There's also apache storm which abstracts you from all the gory details of spinning up workers, pools, tracking message failure, etc.