So, in best-case scenario we can run one-process-multicore-gevent-app?

masklinn · on Feb 10, 2014

That is one goal amongst others, from what I understand.

k_bx · on Feb 10, 2014

STM is about parallelism of your computations, not about concurrency (or any kind of IO). You mess these two. If you want genent on multiple cores -- just pre-fork N gevent workers.

masklinn · on Feb 10, 2014

> STM is about parallelism of your computations, not about concurrency (or any kind of IO). You mess these two.

No, STM is a concurrency mechanism, not a parallelization one. That's why Haskell's is under Control.Concurrent not Control.Parallel.

And one of the points and goals of pypy-STM is that it could allow parallelising event loops with minimal changes. Quoting http://pypy.org/tmdonate.html

> However, as argued here[0], this may not be the most practical way to achieve real multithreading; it seems that better alternatives would offer good scalability too. Notably, TM could benefit any event-based system that is written to dispatch events serially (Twisted-based, most GUI toolkit, Stackless, gevent, and so on). The events would internally be processed in parallel, while maintaining the illusion of serial execution, with all the corresponding benefits of safety. This should be possible with minimal changes to the event dispatchers. This approach has been described by the Automatic Mutual Exclusion[1] work at Microsoft Research, but not been implemented anywhere (to the best of our knowledge).

It is not expected to be faster than preforking evented workers, however it is expected to be much simpler to retrofit non-MP systems for, and does not have the complexity overhead of IPC and multiprocess architectures.

[0] http://mail.python.org/pipermail/pypy-dev/2012-January/00904...

[1] http://research.microsoft.com/en-us/projects/ame/default.asp...

k_bx · on Feb 11, 2014

Thank you, I was wrong. What I initially wanted to say (now that I understand that STM is "CPU concurrency") is that STM is about pure computations. Haskell's STM doesn't let you do any kind of IO inside of it.

I think PyPy guys had some crazy ideas of connecting STM transactions to SQL transactions, but I have no idea how that will work with network connectivity issues etc.

masklinn · on Feb 11, 2014

> STM is about pure computations.

Yes, side-effects (not just I/O, altering non-transactional mutable bindings or objects is also broken) within transactions is fraught with peril as they may be retried time and time again until they succeed. You can still do transactions in languages unable to segregate side-effects, it's just (much) riskier (Clojure/Clojurescript has CAS-based semantics on atoms — usually performed through the higher-level `swap!` operation. It works quite well, but the language will not save you from the footgun of side-effects within a transaction and it will potentially retry hundreds of times if there's a lot of contention on the atom)

thirsteh · on Feb 10, 2014

> STM is about parallelism of your computations, not about concurrency (or any kind of IO).

I'm sorry, what? STM is entirely a concurrency construct. You wouldn't need it if you didn't have concurrency... The whole idea is that access to shared memory happens through database-like transactions.

k_bx · on Feb 11, 2014

Yes, I was wrong. What I was referring to is that STM is "pure" concurrency, while gevent is for IO concurrency, and that you don't do any IO inside STM.

kbutler · on Feb 10, 2014

I think the confusion here is that CPython and PyPy provide single-process concurrency (but not parallelism) without STM. The GIL allows multiple threads to execute concurrently within a single process, but only one thread to execute at a time.

PyPy is implementing an STM solution to provide single-process parallelism in Python, allowing multiple threads to execute at a time, and detecting and recovering (re-executing) when they conflict.

So while STM can be used in concurrent execution without parallelism, existing Python interpreters provide concurrent execution without STM, and PyPy only needs STM to provide parallelism.

pekk · on Feb 10, 2014

> The GIL allows ... only one thread to execute at a time.

No, only one thread can hold the GIL at a time. Not everything requires holding the GIL. You can still get multithreading benefits for IO-bound work and for work done in C extensions like numpy.

kbutler · on Feb 10, 2014

The GIL allows ... only one thread to execute /Python bytecode/ at a time.

rdtsc · on Feb 10, 2014

Concurrency:

* A property of your algorithm and your problem space. Some frameworks / libraries have multiple ways of handling concurrency. Callback chains, threads, waiting on events, etc. This says nothing at all about whether you'll get a speed-up at runtime or not.

* Concurrency is usually split into CPU and IO.

+ CPU concurrency means your code computes things that are largely independent on each other (you divide an image into sub-blocks and compute some function all of them and those functions are concurrent while they run).

+ IO concurrency means your code does input output operations that are independent. You might handle multiple client connections. A connection from one client is not in large, dependent on connections from another client.

Parallelism:

* Parallelism is a property of your runtime system, the particular library, language, framework, hardware, motherboard, that helps you run multiple concurrency units at the same time.

* You would often want to map your concurrency units to run on the parallel execution resources your platform provides as well as possible.

* Could also split into CPU and IO parallelism:

+ CPU : Can execute multiple compute concurrency units at a time. Think of threads, multiple processes, multiple compute nodes. You actually spawn a thread for each of those sub-image blocks and run computations at the same time.

+ IO : You can spawn a thread or process for each client connection and then reads and writes to sockets will effectively execute in parallel. Now, you only have one motherboard, one kernel, and maybe one network card. So deep down they are not 100% and completely parallel. A large transfer from on thread could clog the network buffers or the switch downstream, but it is as good it gets sometimes. Or you could use a select/epoll/kqueue kernel and syscall facility to execute multiple IO concurrency in parallel.

This means:

* You could have IO concurrency but no IO parallelism. In your code you actually read from client 1 socket , then client 2 socket, then when a 3rd client, from client 3 socket, in a sequence in some for loop. Terribly inefficient.

* You could have IO concurrency and IO parallelism. You can spawn a thread (yes even in Python this works great). You can spawn a green thread (something based on eventlet, gevent). You can start a callback chain. Underneath of a lot of this is quite often that epoll/select/kqueue mechanism just disguised with goodies on top (green threads, promises, callbacks, errbacks, actors etc).

* You could have CPU concurrency and CPU parallelism. You managed to map your 100 image sub-blocks to 100 threads. If you have 4 cores, maybe 4 of those will run at a time and you let the kernel distribute those. Or you can hand spawn 4 execution queues with 4 threads, pin them to each CPU core and map 100 sub-block to 4 worker threads.

* You could have CPU concurrency but no CPU parallelism. This is if say you use Python's threads to compute those sub-blocks, underneath it has a GIL lock and will only run one of those units at a time. You could spawn a callback-chain in Node.js and to compute each of those 100 sub-blocks and you will also run only one a time and they might even get confused with each because they'll block each other in a non-obvious way.

anaphor · on Feb 10, 2014

You can also have parallelism without concurrency actually. For example if you use SIMD instructions you're using parallelism with no concurrency.

masklinn · on Feb 10, 2014

GPGPU as well.

k_bx · on Feb 11, 2014

Thanks for explanations, this cleared things out for me.