I feel like I'm too dumb to understand any of this. And I've been writing python...

jeswin · on Nov 7, 2016

Are you saying using greenlets are any simpler than this? IMO that mechanism looks way more complex compared to this. And will probably be less efficient.

The point is this: threads are still expensive in bulk (the CPU has to shuffle a lot of data every time you switch). So all kernels have mechanisms to support parallel IO operations. An async library will use the best available kernel mechanism for IO; epoll on Linux, kqueue on BSDs, maybe IO Completion Ports on Windows (not sure). Turns out, doing that requires some help from the language itself or the code turns into a pyramidal mess. Async keyword addresses the readability aspect of code.

So:

a) It's more complex than synchronous code

b) But it solves the performance problem without too much cognitive overhead (once you get used to it).

quotemstr · on Nov 7, 2016

> threads are still expensive in bulk

They don't have to be. First of all, even ordinary threads are more efficient than you might think. On a really awful low-end Android 4.1 device, I can pthread_create and pthread_join over 5,000 threads per second. On a real computer, my X1 Carbon Gen4, I can create and join over 110,000 threads per second. (And keep in mind that each create-join pair also forces two full context switches.)

For most applications, performance of regular threads is perfectly adequate. In these environments, the maintainability and debuggability advantages of using plain old boring threads makes it really hard to justify using something exotic.

But suppose you do have big performance requirements: you can still use normal-looking threaded code. There's a difference between how we represent threads in source code and how we implement them. It's possible to provide green, userspace-switched threads without requiring "await" and "async" keywords everywhere. GNU Pth did it a long time ago, and there are lots of other fibers implementations.

> the CPU has to shuffle a lot of data every time you switch

Any green-threaded system (with or without explicit preemption points) also does context switches! Such a system maintains in user space a queue of things to work on: as the system switches from one of these work items to another, it's switching contexts! You have the same kind of register reloading and cache coldness problems that switching thread contexts has. There's no particular reason that you can do it much better than the kernel can do it, especially since switching threads in the same address space is pretty efficient.

int_19h · on Nov 7, 2016

The problem with all green thread implementations that I know of is that they're language and/or framework-specific. So the moment you start using them, you get the same set of problems as using setjmp/longjmp in C across the boundaries of foreign code - it either just blows up spectacularly, or at the very least violates invariants because the interleaving code is not aware that someone's pulling the rug from under it.

This can only be solved by standardizing a fiber API and (per platform) ABI, and by forcing all libraries in the ecosystem to be aware of fibers if their behavior differs with threads in any way (e.g. if TLS and FLS are distinct).

Callbacks (and hence promises), on the other hand, work with what we already have, and are trivially passed across component boundaries as a simple function pointer + context pointer, or some suitable equivalent expressible in C FFI. For example, I can take an asynchronous WinRT API (which returns a future-like COM object), and wrap it in a Python library that returns awaitable futures; with neither WinRT being aware of the specifics of Python async, nor with Python aware of how WinRT callbacks are implemented under the hood. On the other hand, if WinRT used Win32 fibers for asynchrony, Python would have to be aware of them as well.

anonymoushn · on Nov 7, 2016

I expect you can also use a callback that switches greenlets, or one passes the values it got to lua's coroutine.resume.

int_19h · on Nov 7, 2016

How can it switch green threads without breaking any foreign code currently on the stack? Consider what happens when said code holds an OS mutex, for example.

The only way I see this working is if your green threads roll their own stack on the heap, and switch that, without touching the OS stack. But then how is the result fundamentally different from promise chains? Their callbacks and captured state essentially form that very same green stack.

quotemstr · on Nov 7, 2016

To start a fiber, you allocate some memory, set RSP to the end of that memory, set your other registers to some arbitrary initial state, and jump to your fiber routine. To switch fibers, you set RSP to some other block of memory, restore your registers, and set PC to whatever it was when you last switched away from that fiber. There's nothing magical, and it works with almost all existing code. If you hold a mutex and switch to a different fiber, the mutex stays held. How could it be otherwise?

int_19h · on Nov 8, 2016

I was thinking of a situation where thread-aware but not fiber-aware code uses mutex to synchronize with itself, which breaks with fibers because they reuse the same thread, and the mutex is bound to that thread (so if another fiber tries to acquire that mutex, it's told that it already has it, and proceeds to stomp over shared data with impunity).

But upon further consideration, I realize that in this narrow scenario - where fibers are used in conjunction with callback-based APIs - this shouldn't apply, because you can't synchronize concurrent callback chains with plain mutexes, either.

Having said all that, are there any actual implementations that seamlessly marry fibers with callbacks? I don't recall seeing any real world code that pulled that off. Which seems to imply that there are other problems here.

Of note is that CLR tried to support fibers, and found it to be something that was actually fairly expensive. By extension, this also applies to any code running on top of that VM:

"If you call into managed code on a thread that was converted to a fiber, and then later switch fibers without involvement w/ the CLR, things will break badly. Our stack walks and exception propagation will rely on the wrong fiber’s stack, the GC will fail to find roots for stacks that aren’t live on threads, among many, many other things." (http://joeduffyblog.com/2006/11/09/fibers-and-the-clr/)

GC is a sticking point here, it seems - clearly it needs to be fiber-aware to properly handle roots in switched-out fibers.

anonymoushn · on Nov 8, 2016

That all sounds correct to me. I'm not familiar with greenlet internals, but lua's stacks live on the heap and the whole situation ends up being similar to promise chains in terms of where your state is at runtime.

jeswin · on Nov 7, 2016

Short replies coz on phone.

1. Async await is almost similar to normal looking threaded code. Just add await before a normal looking call.

2. A language could have chosen to make it "exactly the same" by auto inserting awaits, but then you don't get to say when you don't actually want to wait. Many times you don't.

3. I agree native threads are cheap. But you still have a) thread stacks and additional control structures, b) wouldn't you have to deal with things like processor affinity? I mean, either you/lib or the kernel. And the kernel already does it for you.

TimJYoung · on Nov 7, 2016

Excellent discussion, and your points are all very much spot-on. I just wanted to add this re: Windows and fibers/threads because it is very much relevant to the conversation:

https://blogs.msdn.microsoft.com/larryosterman/2005/01/05/wh...

fzzzy · on Nov 7, 2016

I don't believe you. Show me your code. I think you just completely made your numbers up.

quotemstr · on Nov 7, 2016

O ye of little faith

https://gist.github.com/dcolascione/ae9be560ecadc349c25e4f1e...

fzzzy · on Nov 7, 2016

Thanks!

RubyPinch · on Nov 7, 2016

> On a real computer, my X1 Carbon Gen4, I can create and join over 110,000 threads per second.

is that C, or python's multithreading?

zzzeek · on Nov 7, 2016

> And will probably be less efficient.

they're not. gevent (and threads) are way faster than explicit asyncio, as all of asyncio's keywords / yields each have their own overhead. Here's my benches (disclaimer: for the "yield from" version of asyncio). http://techspot.zzzeek.org/2015/02/15/asynchronous-python-an...

brettcannon · on Nov 7, 2016

Is that still true? Some uvloop benchmarking has shown it to be equivalent to gevent when using streams: https://magic.io/blog/uvloop-blazing-fast-python-networking/ . Plus Python 3.6 has a bunch of optimizations for asyncio where all of these numbers are going to have to be re-evaluated.

zzzeek · on Nov 7, 2016

not sure. I'm hoping the more native support for asyncio in 3.6 has improved matters. Certainly though, it's never going to be faster than gevent. Or threads for most tasks.

quotemstr · on Nov 7, 2016

People keep inventing funky new ways of representing threads.

With or without async, we're writing threads. (Promise chains are _also_ threads, very awkwardly spelled.) Really, we're arguing over whether we want our preemption points to be explicit or implicit. I prefer implicit myself, because the implicit style leads to much clearer code.

I understand how the JavaScript people might be excited that they can finally have threads, even if ugly ones, but there's no reason to get the rest of the world to switch to explicit-preemption-point threads.

int_19h · on Nov 7, 2016

> Really, we're arguing over whether we want our preemption points to be explicit or implicit.

It's not even that!

It's not like you actually get to decide where to await in async/await code - you have to await on any call that is async, if you expect to get the result.

Now, if the underlying framework uses hot tasks - meaning the async operation starts executing as soon as it's invoked, and not when the returned task is awaited (as in e.g. .NET/C#) - you can choose to omit async to, effectively, fork your async "thread". So NOT doing await on something is just a fork operation. It's the reverse from regular sync code, where thread forks are explicit, and sequential flow on a single thread is implicit.

One other case where you wouldn't await is when you need to await on a combination of any or all tasks at the same time (i.e., wait until all tasks complete, or wait until one of the tasks completes). But the first one is equivalent to a thread join in sync code, and the second to a condition variable. So, again, you get a case where something more explicit in sync code is more implicit in async code, and vice versa.

Now note that all this is solely about syntax! You can take the C# compiler, and change it so that every awaitable statement is automatically awaited, except when the newly introduced operator "taskof" is applied, in which case you get the raw future instead. Voila! Cooperative future-based multitasking with implicit preemption points. Yet it works exactly the same, and will even be able to call into and be called from any existing C# code compiled by the original compiler.

I suspect that this will be the next step after async/await, once enough people notice that the default (non-await) behavior is something that they need very rarely, and figure out that it's better to rather change the syntax so that the much more common thing (await) is implicit. Similar to how the use of =/== for assignment and comparison has won out over :=/= in imperative languages.

bufordsharkley · on Nov 7, 2016

After watching (Curio creator) David Beazley's presentation from earlier this year on async/await[0], I feel I finally get it. Recommended watching.

[0] https://www.youtube.com/watch?v=E-1Y4kSsAFc

insertnickname · on Nov 7, 2016

The amount of times Beazley says "insane", "nightmare", etc. in this talk makes me wary.

dismantlethesun · on Nov 7, 2016

Welcome, the wonderful world of writing anything in Javascript.

Imagine the same thing using Promises:

   def proxy(dest_host, dest_port, main_task, source_sock, addr):
      main_task.cancel()\
          .then(lamdba _: curio.open_connection(dest_host, dest_port))\
          .then(lambda dest_sock: copy_all(source_sock, dest_sock)

noobiemcfoob · on Nov 7, 2016

It's statements like this that have kept me from ever learning _anything_ in Javascript...

BonoboBoner · on Nov 7, 2016

Well it is only useful when you really rely on asynchronous programming. Nobody states that every piece of code is supposed to be written like this. You should only use async/await when a thorough performance analysis shows that it is your bottleneck.

Think of handling a web request, where you have to do parallel I/O requests to subsystems like a database, a webservice, redis, and so on. I think async/await gives us a nice standard way of describing "hit me back once X is done".

foota · on Nov 7, 2016

I don't think most code will be this dense with await.

sidlls · on Nov 7, 2016

And Rust's developers think that 'unsafe' in third-party crates will be well-vetted and therefore actually "safe", most C developers don't think somebody will incorrectly free or screw with memory they've allocated and passed back to the caller, most C++ developers don't think anybody will (ab)use 'const_cast', and so on.

A lot of terrible bugs in code is caused by people making assumptions such as yours.

piotrjurkiewicz · on Nov 7, 2016

He didn't make an 'assumption' like those ones you described.

This is an artificial example of a function copying unmodified data from source to destination. There are async and await tokens in every line, because every line is doing an IO operation. I a real world app this data would be somehow processed in between, using synchronous function calls, therefore without async/await tokens.

imtringued · on Nov 7, 2016

>most C++ developers don't think anybody will (ab)use 'const_cast', and so on.

These constructs are opt-in. If you don't want them in your codebase you can find their location by a simple text based search and remove them. In C everything is "unsafe". You can't opt-out.

steveklabnik · on Nov 7, 2016

I don't think anyone is saying that you'll never see crates with bad usage of unsafe. What you will hear them say is that by having the ability to share code, since more people are looking at and using the same codebase, it's more likely issues will be found, and that when they're fixed, they help everyone using the package, rather than just those who found it.

int_19h · on Nov 7, 2016

I've been writing async/await code for the past 2.5 years, and no, it actually is typically this dense, if you count tokens (real identifiers are obviously longer, so it's not as bad character-wise, and awaits are not quite as prominent).

foota · on Nov 7, 2016

Interesting, thanks for sharing that insight. Do you feel that your work is representative, or is there some reason that the code you write would have a higher than usual density? It seems like a lot of code, which is just business logic, would not use these constructs other than on the io barriers.

int_19h · on Nov 7, 2016

Don't forget that async is "viral": if you call an async function and need to do something with its result, the calling function must in turn be async for await to work inside it. So the moment you start doing some async I/O at the bottom of some call stack, the entire stack becomes "infected" by async, and needs have awaits inserted in every frame.

And it so happens that I work on the kind of products where a lot of useful work revolves around I/O: IDEs.

Sacho · on Nov 7, 2016

It doesn't have to be viral. C#'s tasks have a blocking Wait[0] method which allows you to use an asynchronous Task without changing the signature of your synchronous function. The tradeoff is more verbosity.

[0] - https://msdn.microsoft.com/en-us/library/dd235635(v=vs.110)....

int_19h · on Nov 7, 2016

As noted in another comment, Wait is extremely prone to deadlocks - if you happen to Wait on a thread that's running the event loop, then no task that's scheduled on that loop can execute until Wait returns. So if you're waiting on such a task, or on a task that depends (no matter how indirectly) on such a task, you get a deadlock.

Now, if you're writing a library, you pretty much cannot assume anything about the event loop and what's scheduled on it. If your library invokes a callback at any point, all bets are off, because you don't know which tasks that callback may have scheduled, or which tasks it's waiting on. Similarly, if you provide a callback to a library, you also don't know which tasks you might block by waiting.

So, in effect, the only safe place to wait is on a background thread that was specifically spawned for that purpose, and that is guaranteed to have no event loop running on it.

nickkell · on Nov 7, 2016

That's not the only tradeoff. It completely negates the benefit of asynchrony and can be a source of deadlocks

rudolf0 · on Nov 7, 2016

I still use gevent any time I need async code. It's also easy to tack onto existing projects with its monkey patching. I've never seen a need to migrate away from gevent, even if it's inarguably a language hack.

imtringued · on Nov 7, 2016

It's reasonable compared to the old way of having three layers of callbacks in Node.js.