Hacker News new | past | comments | ask | show | jobs | submit login

The general sentiment in this thread seems to be that Guido et al "blessed" the Twisted way, it's a pity and now using asyncio and friends is the only way to suspend a Python stack in a concurrent program besides using OS-level threads.

This is not correct. Green threads, as a programming paradigm, are just a sub-optimal (but cheaper) way of doing preemptive multitasking. Yes, switch is explicit, but user code can't know what will switch. So you are not supposed to treat green threads any differently from OS threads. ie you still need your mutexes, semaphores etc. if you want to avoid race conditions. Again, that's because the caller doesn't know whether a function will yield execution to the next queued task at some point. Guido explains this point pretty well in one of his PyCon keynotes.

So green threads may be great but they don't bring anything new to the table.

However, Twisted-style concurrency (aka cooperative multitasking) is a different paradigm. In Twisted, you know that you only have one thread running at a time, so you actually don't need any thread synchronization primitives when accessing non-local state. This simplifies things a great deal. Yes, not having to spawn a thread for every single concurrent IO operation has other great benefits, but that's not the reason why CPython now has a blessed event loop -- it's cooperative-multitasking-the-paradigm.

Before asyncio, there was no standard way of doing cooperative multitasking. Now there is and it's baked right into the language. Use it if you like it. If not, the old ways of doing things work just fine in Python 3.

I'll admit that the concurrency model in Python 3.4 was not perfect. However, what we have with Python 3.5 and up looks quite polished.




Another issue with green threads is that they usually do not work very well (or at all) across API, much less ABI, boundaries.

Basically, the moment your control flow is calling into some shared library, you probably want C API at that boundary for the sake of portability and stability. Exposing what is an, essentially, promise-based system that way is not hard. Better yet, if the other side has some analogous construct, you can map to that. But how do you do it with green threads?

Even platforms that have an OS-wide unified green thread primitive, like Win32 fibers, which would presumably solve this problem, find their use very lackluster, because many languages and frameworks just plain don't support fibers correctly. Even CLR tried to do it once and dropped the feature; forget about Python etc.


I disagree that the code can't know what will switch with green threads. They switch on I/O and when explicitly requested. An example of using this to get rid of concurrency control: http://www.underengineering.com/2014/05/22/DIY-NoSql/

If you want to make sure that there are no context switching in certain part of your code, you can do it ASSERT-style, something like:

    # enable this only if you use atomic, so a new module that should be imported before gevent
    in_transaction = False
    if __debug__:
        import greenlet
        old_switch=greenlet.greenlet.switch
        class _greenlet(greenlet.greenlet):
            def switch(*args, **kwargs):
                if in_transaction:
                    raise Exception('Switching context during / atomic')
                old_switch(*args, **kwargs)
        setattr(greenlet, 'greenlet', _greenlet)

    @contextmanager
    def atomic():
        """
            Ensure that a function or a block of code is atomic, raise exception if it's not

            Usage:

            @atomic()
            def myTransaction(...):
                ...

            or 

            with atomic():
                ...
        """
        global in_transaction
        in_transaction = True
        yield
        in_transaction = False


And how is this not a mutex?


This only runs in the debug version of the code. I hope your mutexes run in production, too.


So you rely on hoping your tests exercise all possible paths, because if not you get silent race conditions in prod?


If you want you can run it in production, too (this would give you a warning, won't prevent race conditions, but there is almost no performance penalty). The point was this is very different from a mutex.


Have you seen libmill[0]? Single-threaded coroutines - no mutexes, semaphores, etc. We're using it in Zewo[1] with great success.

[0]: http://libmill.org [1]: https://github.com/Zewo/Zewo


Maybe the real issue with Twisted/asyncio is that it requires that all your code is "asyncio-ready". A bit like lock-free programming and only using linked lists.

That said, it's great that Python has something like this in its stdlib now, a bit like Node, Akka, and Go. But maybe asyncio needs a great big warning sign: "Use this only if you know why you need it and understand the impact it will have on your application's architecture."


> Maybe the real issue with Twisted/asyncio is that it requires that all your code is "asyncio-ready".

I don't think this is true. It's easy enough to ask for something that isn't "asyncio-ready" to run in a real thread. Give it a function call. It will run it in a thread, and give you a Future for when it's done. See https://docs.python.org/3/library/asyncio-eventloop.html#exe... for details.

Sure, it's a pain because you don't get the benefit that asyncio gives you for that part of code, but isn't at all an "all or nothing" proposition.


> ie you still need your mutexes, semaphores etc. if you want to avoid race conditions [...] In Twisted, you know that you only have one thread running at a time, so you actually don't need any thread synchronization primitives when accessing non-local state.

I disagree (unless you have a toy example or demo). As the software grows it becomes peppered with yields at the top level. Everything start yield -- authentication is an yield, launching a background job is an yield, writing to the databases. At that point you might find that some shared data has to be protected as well from concurrent IO requests so you still need semaphores and locks.

Heck, Twisted has http://twistedmatrix.com/documents/9.0.0/api/twisted.interne... I had to use it too because multiple callback chains modifying and accessing the same state had a race condition. Yeah I knew I could multiply a matrix quickly without having to acquire a lock, but I wasn't multiplying matrices I doing IO-bound things. With concurrent IO requests there is still a potential for a data race.

> So green threads may be great but they don't bring anything new to the table.

Green threads bring:

1. Lighter weight concurrency units than native threads.

2. Green threads don't fragment the library ecosystem. (For a language with batteries included this is rough). If you have been using Twisted you know what I am talking about. "Oh I found a library that does this protocol. Ah, but it is not Twisted, can't work with it. Start writing a parallel implementation

3. Provide a better abstraction without extra code bloat. When you really want to put an item in a shopping cart, do you really care anything underneath yields? You want to write : authenticate(); get_price(); get_availability(); update_cart(); respond_to_user(); and such. That code should not know about select loops and reactor and awaits. Lower level frameworks should handle that and top level code should be clean and obvious.

After switching from Twisted (even with inlineCallbacks) I cut the total lines of code in a large code base by half by using eventlet (that was before gevent), because it cut all the callbacks and handlers and all that stuff. Those are lines of code cluttering the business logic, they need maintenance, they need people to read them when bugs happen.

Are Gevent and Eventlet ideal? No, they have been always a hack. But in practice I'll take the monkey-patching vs awaits, yield or deferreds and having to hunt for or rewrite libraries which speak that particular IO "language". I understand that on paper and in small example those look neat in clean, in practice it turns into a mess.


My point was that green threads sort of emulate OS threads so should not be treated any differently. As Python already has nice library-level support dealing with such thread-based concurrency, there's just nothing to do for the core language team to support this use case.

> 2. Green threads don't fragment the library ecosystem.

They do. That's why you need to monkeypatch everything.

Thing is -- you got two ways to do IO in an async world: Use the async system calls nothing less than the kernel provides or use threads to use blocking system calls and emulate async IO. There's no escaping that reality irrespective of the async paradigm you are using, green threads included.

> 3. Provide a better abstraction without extra code bloat.

From what I understand, your problem has always been the GIL, not Twisted. If your business logic is not better expressed in Twisted, you should not use Twisted, period.

For some of the code I need to deal with, Twisted's callback logic fits perfectly. It makes my code more testable and easier to reason about. So that's what I'm using. For anything else, I just deferToThread and use blocking code just like normal.

This said, I'd still like to emphasize one very important point:

Here's the secret sauce of gevent: https://github.com/python-greenlet/greenlet/blob/master/plat...

A sibling comment to yours explains briefly how Windows folks have given up trying to get green threads to work even with kernel support.

I do realize the average Python programmer couldn't care less about such low level stuff. However those of us who peeked under the hood of gevent and realized how many basic assumptions it violates stays far away from it.

Green threads are the GOTO of cooperative multitasking. In case you want to switch to "structured programming" from using GOTO-based code, you need to switch to the Twisted mindset.


> Green threads are the GOTO of cooperative multitasking.

You know, in GOTO based vs. structured code, one is a mess where nobody can get things correct at the first several tries, where another is a organized piece, built observing programmers limitation.

The same does apply to bare async-io vs. green threads, but you got something missed-up there.


> > 2. Green threads don't fragment the library ecosystem.

They do. That's why you need to monkeypatch everything.

Not in the same way Twisted or async + yields does. Monkeypatching it not done in the library, that's the whole point. It is done in the start phase of the process once. If I get an IRC library which does uses sockets and spawns threads, that could work with Gevent, eventlet or just regular threads.

If I get a Twisted one then they returned deferreds an my main program doesn't know how to handle deferreds. Or alternatively I am using Twisted I have to find libraries which return deferreds. That's what I meant by fragmenting.

> se the async system calls nothing less than the kernel provides or use threads to use blocking system calls and emulate async IO.

It's the other way though? Green threads use async version of socket calls with a select/poll/epoll/kqueue hub (or reactor in Twisted world) but then they provide a blocking synchronous API to the higher level code.

That is usually the sanest abstractions. The only times I've seen callbacks work well is when callbacks are very short, think something like web simple web proxy for example.

In general callbacks in a complex program end up a mess from what I see. inlinedCallbacks or co-routines with yields help there, I've used those. But it is still suboptimal as library ecosystem is still fragmented and code is still cluttered with yield and awaits and so on.

> Green threads are the GOTO of cooperative multitasking. In case you want to switch to "structured programming" from using GOTO-based code, you need to switch to the Twisted mindset.

I think it is the opposite. A callback chain is an ad-hock, poorly implemented and obfuscated model of a blocking concurrency unit. That is a socket event starting a callback chain of cb1->cb2->cb3... is usually much better represented as a set of nicely blocking functions calls fun1->fun2->fun3. Except callbacks are scattered all over. And just because they are callbacks doesn't mean you don't locks and semaphores, you can still have data races between another callback chain started from a different socket which also calls cb1->cb2->cb3 before first one finished.

Also noticed that languages which are used in highly concurrent environments follow the same paradigm, namely Erlang. It is not a sequence of callback but rather isolated blocking concurrency units. Inside each unit calls are blocking but there can be many concurrent (and run in parallel) such concurrency units. Go does the same.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: