The fact that they went for stackless is a testament to how bad committee-driven...

colejohnson66 · on Feb 22, 2023

Can you ELI5 why stackless is bad?

jlokier · on Feb 23, 2023

There are pros and cons, but here are a few things which favour stackful.

Function coloring is a disadvantage of stackless (assuming it's mixed with regular, non-async C++ code). Async and non-async libraries can't interoperate the way they can with stackful. Arbitrary functions cannot block the calling coroutine with stackless, but they can with stackful. Some people consider the distinction a feature (a bit like checked exceptions, it's debated as to whether it's helpful or adds brittleness), but it's often a problem in large codebases that weren't written to be entirely async (including libraries) from the start. Anyway, if you want coloring as a distiction in your type system you can still have it. But with stackless, you have no choice.

Memory allocation patterns are different with stackless, sometimes worse. While a stackful coroutine system requires stacks to be allocated for each new coroutine, obviously, in a stackless async/await system there is typically a higher rate of memory allocation, and with varying sizes, to hold the temporary states of each coroutine which can occur at each await site. There are usually many more of those sites than coroutines.

In addition, those temporary states being stored in heap-allocated memory are likely to have a lower CPU cache hit ratio than stack memory during the run of a particular coroutine.

Something like the Linux kernel is very difficult to write in an explicit stackless style. The Linux kernel design uses stackful coroutines pervasively. This is not theoretical: It has come up in practice. Years ago there were a number of attempts to change the Linux filesystem and block I/O code to have async code paths (i.e. stackless style, state machines), so that a proper async I/O ("AIO") could be offered to userspace. Every attempt failed because it was too much work or too difficult to make all the filesystems code and everything they call, every path, fully async. Some of those changes went in, but the result was Linux AIO requests were not reliably async (and still are not), as they could sometimes block when they hit some less common paths in filesystem code, e.g. doing things like updating a block extent map, b-tree, or LVM/RAID corner case. In the end, the async I/O designs that worked well and reliably didn't block on request submission all ended up delegating I/O requests to scheduled stackful coroutines, i.e. kernel threads. These also turned out to perform well, which is not a coincidence, as stackful context switching is efficient. Of these designs, the one which ended up being adopted and well known is called io_uring.

For an example of less common paths that still need to be async, some operations have to allocate memory in all sorts of ways, including temporary memory when calling library functions or traversing data structures. In a kernel, memory allocation (i.e. malloc/new) has to be able to block the calling task temporarily, so that when there isn't enough free memory immediately available, the allocating task will wait for another task to free some, as that is preferable to failing an entire operation. Try doing that with C++ coroutines and idiomatic object allocation, and you will hit the function color problem: You can't async allocate a new object. You could of course write in a non-idiomatic style, not using new or constructors or anything which calls those, doing your own "await my_async_new" everywhere, but having to do that utterly consistently throughout a large codebase, and requiring every little function (including all libraries) to work that way as well, would be not really using C++ as it is meant to be used, and comes with its own risks. Alternatively you can block the thread the executor is running on, but that defeats the point of async coroutines.

With stackful coroutines, those kinds of operations Just Work(tm).

You can achieve the same thing with stackless by allowing such operations to block the executor thread, and spawn new executor threads which do work-stealing to ensure other coroutines are able to make progress while the first one is blocked. I believe that is what Go and Rust's Tokio do. The effect is to allow coroutines to be a mixture of stackless and stackful as needed, optimising for both worlds at the same time. It has the particular benefit of improving performance when the executor needs to call an operation which blocks in a library function or in the kerne. Howver, as with stackless, to ensure every coroutine can progress in an async manner without being stalled by coroutines that are blocked, this also needs every code path and library function to buy into doing that (if only by annotating "I may do something that will block now" regions). So it's also not suited for retrofitting to all codebases.

artemonster · on Feb 22, 2023

For coroutines to be really useful, they have to be stackful. The guy that originally did the proposal for C++ is also an author of multiple coro libraries and did research on this. You can emulate stackful coros with stackless via trampoline, but heck, you can emulate stackless coros with closures and trampoline too! The requirement of this extra trampoline makes their use extra convoluted and negates the very slight advantages that such functionality may really bring. Making stackful "right" was hard, so a useless compromise was made, which is basically like adding a syntactic sugar to the language.

int_19h · on Feb 22, 2023

It may be a compromise, but what exactly makes it "useless"? A similar style of async has been in use in e.g. C# for over a decade now, and it has been very successful there. Yes, it is syntactic sugar - but so are e.g. lambdas. What matters in practice is that it makes some kinds of code much shorter and easier to read.

artemonster · on Feb 22, 2023

stackful coroutines is a feature that is on par with power of delimited continuations (proven fact in academia, you can easily find somewhat easy to follow papers on this topic), stackless coroutines is a stupid gimmick. You see the "compromise"? You have argued to have a case to add a car, but after long debate, politics and "compomsie" you got a TOY car instead.

andrekandre · on Feb 23, 2023

what kind of things does a stackful coroutine practically enable that would be impossible with stackless design?

artemonster · on Feb 23, 2023

Theoretically everything is possible with a NAND gate, so the question is badly formulated from the beginning. There is a sibling comment from @jlokier who did a gread job of providing a sane summary, and not just some incoherent rambling like I did.