A green threading library with true concurrency for Python

jacob019 · on Aug 16, 2013

I don't really get what advantage this gives me beyond using gevent. There's still no parallelism. The readme describes it as "A solution to concurrency," but I already get that using gevent. Is it just to improve communication between green threads?

jerf · on Aug 16, 2013

It appears to add some amount of Python-bytecode level preemption to gevent, which allows you to hopefully avoid some of the pathological cases of cooperative scheduling. Said pathological cases are only a matter of scale... if your program becomes large enough, you will hit them, eventually.

That said, with no offense intended to mirman, I'd really hesitate before using this for anything serious enough to reach that scale in the first place. Gevent, frankly, visibly pushes Python to the limits (and occasionally a bit beyond), trying to also tack on some preemption on an environment not fundamentally expecting it would scare me another notch.

mirman · on Aug 16, 2013

No offense taken - both of these are obviously visibly delicate.

There is a version in the history that used Greenlet instead of gevent which was potentially a bit less delicate, but it required wrapping of the main file and didn't work with time.sleep, and I didn't feel like it was worth writing my own locks, semaphores, mutexes, pipes and whatnot.

hogu · on Aug 16, 2013

what are the pathological cases one runs into with gevent?

jerf · on Aug 16, 2013

With a cooperative scheduler, you will, sooner or later, experience some form of starvation. The most obvious is the process that just infinitely loops, but less obvious situations will end up popping up too; calls that you thought were handled by the event loop but turn out to be blocking and add up when you start calling them at scale, strange behavior when you have a set of processes that turn out to yield far less often than you thought and the system starts behaving with much higher latency than it should if you get too many of them, and all kinds of such manifestations. There's also the inability to create things that watch other things; if something does go spinning off into infinity, nothing else gets to run to kill it.

You can hack around many of them, but you eventually hit a wall, and the effort of the hack increases rapidly.

Note I didn't really use gevent in this reply, this is just about cooperative scheduling. There's a reason why we've all but completely abandoned it at the OS level for things we'd call "computers" (as opposed to "embedded systems" or "microcontrollers", etc). I tend to consider cooperative scheduling an enormous red flag in any system that uses it... and yes, that completely and fully includes the currently-popular runtimes that use it.

anonymoushn · on Aug 16, 2013

It sounds like this sort of problem comes from using a cooperative scheduler to implement concurrency of arbitrary routines rather than control flow. I haven't been in a situation in which it would even be possible for something to yield less often than I expect, because I expect it to run until it yields. Similarly I don't often find that subroutines return too infrequently because I expect them to run until they return.

This library is probably nice for the places I would otherwise use threads.

jerf · on Aug 17, 2013

You will eventually, at scale, be wrong about that. To have full and correct knowledge of exactly how long your code takes to run sufficient to do this sort of scheduling correctly, by hand, in advance of running it, is basically equivalent to claiming that you never need to profile code because you already know exactly how long it takes. And it is well known and established to my satisfaction that even absolute, total experts in a field will still often be surprised about what actually comes out of a profiler, even in code strictly in their domains. You may well be right most of the time... but that is all you can hope for.

anonymoushn · on Aug 17, 2013

If it takes more than 16.67ms to run a frame's worth of update-and-draw, then it does, and replacing "wake up every in-game entity that asked to wake up this frame" with "let a preemptive scheduler manage ~10,000 threads that want to wake up, do almost nothing, and then sleep for k frames, while some master thread waits on a latch until they're all done" seems unlikely to make it any faster. If the logic my server must perform to handle a request is expensive, then it is, and replacing an event loop with a single-threaded preemptive scheduler will not increase throughput.

I'm not sure why it is difficult to do this sort of thing correctly. The scheduler does next to nothing in the "server with connections managed in coroutines" case and probably makes matters worse in the "storing game state in execution states" case. It could have a positive impact in the server application if one routine is secretly going to crash or run forever, in the sense that the other routines will continue running while the problematic feature is fenced off or fixed.

mirman · on Aug 16, 2013

Fewer explicit yields & no monkey patching necessary for IO performing libraries.

MetaCosm · on Aug 16, 2013

As a former python user, I wish this library had existed 18 months ago -- would have helped with some of the nasty cases you can get caught on with gevent.

erikb · on Aug 16, 2013

It basically is 50 lines of code that implement some stuff from gevent... So yes, it's more of a joke I guess.

mirman · on Aug 17, 2013

So number of lines of code makes a library a joke? One of the most commonly used Haskell libraries is only 25.

http://hackage.haskell.org/packages/archive/forkable-monad/0...

Also, this doesn't implement some stuff from gevent, it implements some stuff over and using gevent.

erikb · on Aug 17, 2013

Yes that's exactly what I'm saying. If external code is less then some hundred lines of code, it would be better suited as a tutorial and people would be better at implementing those lines themselves. And yes, that's only my opinion.

mirman · on Aug 20, 2013

Do you have any reasoning behind that opinion, or is it just unfounded religious zeal?

Why should somebody who wants to use threading have to know how to write a preemptive scheduler? If you want a tutorial in addition to the library thats fine, but many good tutorials also release their code as libraries. In open source world, releasing code as a library does not mean having to keep people from understanding the workings of that code. Why should people HAVE to learn how the code works to use it though?

pekk · on Aug 16, 2013

Since this is a very new project, it is a good time to suggest that you abide by PEP8 (e.g., no 'waitAll') since this would be widely appreciated, and is not easy to fix later on.

mirman · on Aug 16, 2013

waitAll gone. Before I put it in any package managers it will get style guided. This is currently on version -1.0.0.

derleth · on Aug 16, 2013

> version -1.0.0.

Version negative one? I don't think I've ever seen that before. Usually, the very earliest versions of software are numbered like 0.0.1 or something like that.

mirman · on Aug 16, 2013

- here actually negates the ordering of the numbers in the list. -1.0.0 === 0.0.1.

derleth · on Aug 16, 2013

> - here actually negates the ordering of the numbers in the list. -1.0.0 === 0.0.1.

Well, I've certainly never seen that before.

ceol · on Aug 16, 2013

It should be written 0.0.1[::-1]

fusiongyro · on Aug 16, 2013

> Conpig threads still can only run on one core of a processor.

The disillusionment caused by having so many options for non-parallel "concurrency" in Python is, I believe, feeding the high defection rate from Python to Go.

mirman · on Aug 16, 2013

The lack of options for parallel (and non-parallel) concurrency is, along with other things, feeding the high defection rate to many other languages.

pekk · on Aug 16, 2013

We've had processes, threads and greenlets for a while now... if anything the problem isn't that there are no options, but too many options that require understanding to choose and apply.

Many of the people complaining about this issue don't have a demonstrated problem and could try any simple approach first (if the point is not just to slam Python in favor of something else, from the beginning).

MetaCosm · on Aug 16, 2013

This type of response is why I gave up on Python entirely. Not to pile on you pekk, it isn't your fault, but it is a tone... defensive apologist... "first of all there is no problem, and if there was a problem... it is that Python is too awesome"

As someone who has had to ship stuff using multiprocess & gevent to actually meet real world scaling needs -- and integrating them with C code and communicating to a C++ application via ZMQ (inprocess by sharing the object) ... the sad fact is once we started to tackle really hard problems in Python that aren't pre-solved via a nice library all those early advantages fell away and we craved the blessed simplicity of C++ (note: sarcasm, C++ isn't simple, but it was far simpler than the Frankenstein's monster we built).

fusiongyro · on Aug 16, 2013

MetaCosm made the point quite eloquently, but let me juxtapose "too many options that require understanding to choose and apply" with Go, which has exactly one option, which requires no special understanding to choose and apply, and gives one exactly what one wants in basically any situation.

I am not really a Go proponent. I'm a Haskell user, personally, and Haskell, like Python, has three or four options that require understanding to choose and apply. The difference there being that in Haskell, each one of them actually gets you real parallelism, no fine print necessary. I bring it up to point out that the situation with Python is not a good example of what you might call "intrinsic complexity" (as you seem to be implying) or the Go solution would not be so much simpler, nor is it really an example of there being many better higher-level abstractions, or more of them would resemble Haskell's many high-level options. It is simply a bad situation that produces many poor kludges, and the mentality that everything is fine is (in my opinion) feeding a substantial defection rate to Go.

MetaCosm · on Aug 16, 2013

As one of those defectors... it isn't so many options, it is that they all have piles of gotchas and corner cases that can take weeks (months!) to debug in complex environments.

Yes, you absolutely can use all your cores by combining multiprocess, gevent and custom C code. But, debugging that stack of a level of hell I will never return too, ever.

gtani · on Aug 16, 2013

And the numpy/scipy/ipython notebook/scikit-learn axis is driving a lot of python adoption these days. Look in any blog about beginning data mining/machine learning, odds are it'll say learn python or R.

Different kind of user, but... just sayin

fusiongyro · on Aug 17, 2013

That's true, and not incompatible with what I'm saying. I'm saying that the majority of defection from Python I hear about is defection to Go. A lot of that seems to be taking the form of loud and proud blogging, so we hear about it here. The defection rate from Python is almost certainly much lower than its overall growth rate, or it would be dying (Perl). I'm just saying, it's not an accident that people are leaving Python for Go. They want cheap and easy parallelism and concurrency. Python is great at many things, maybe even most things, but the state of affairs with parallelism is very weak. Python can expect to continue to lose people to Go until this is addressed in a real, tangible way that doesn't sound like excuse-making. I don't really expect the situation to change, because Python doesn't need it to, but the excuse-making is annoying.

rektide · on Aug 16, 2013

The resounding question in my mind is why this, when there is Stackless Python? What's better about this greenthreading impl? http://www.stackless.com/

mirman · on Aug 16, 2013

This is a library that can be used to supplement any implementation.

If I'd had the option to switch us to stackless easily, and I could guarantee it was as fast, worked with all the libraries, and was as stable, I probably wouldn't have written this. I imagine that there are a lot of people in the same boat, where switching interpreters isn't really an option.

mietek · on Aug 16, 2013

> Conpig threads still can only run on one core of a processor.

This isn't true concurrency. Scaling to 20 million requests per second over 40 cores on a single machine is true concurrency.

functional_test · on Aug 16, 2013

You're confusing concurrency [1] and parallelism [2] -- this is addressed on the bullet point before the one you quoted.

[1] http://en.wikipedia.org/wiki/Concurrency_(computer_science)

[2] http://en.wikipedia.org/wiki/Parallel_computing

mercurial · on Aug 16, 2013

Single-core concurrency is concurrency is concurrency (you have different computations occurring concurrently, even if they are not executing at the same time). What it's not is parallelism.

stonemetal · on Aug 16, 2013

Concurrent means at the same time. You can't have things happening concurrently but not at the same time.

mirman · on Aug 16, 2013

Technically no, you can't have two things happening concurrently and but not finishing in the same time period.

But what a "thing" and how long a period is are up for grabs. If we choose period to be anything longer than 1 millisecond, then this library will finish executing both things in that time period.

http://stackoverflow.com/questions/1050222/concurrency-vs-pa...

silverlake · on Aug 16, 2013

This may help explain the difference: http://blog.golang.org/concurrency-is-not-parallelism

stonemetal · on Aug 16, 2013

Should I have thrown a literally in there? Literally Concurrent means at the same time. Parallel literally means non intersecting. I understand some dumbass has\is trying to co opt the language it doesn't mean I have to like it.

If one of them should mean one thing and the other the other why not have the one that literally means at the same time for the term that means at the same time. And have the one that means non intersecting mean threads of execution that are beside each other but not touching without lots of pain and suffering. Sorry, rant bit off.

mercurial · on Aug 17, 2013

> I understand some dumbass has\is trying to co opt the language it doesn't mean I have to like it.

That's called a lingo or a jargon (in this case, programmer lingo), and it involves making up words or having a different definition for a given word in the correct context.

erikb · on Aug 16, 2013

Where's the library? I see about 50 lines of code that don't go far beyond a hello world of Gevent and some strangely written nose tests.