Hacker News new | past | comments | ask | show | jobs | submit login
C++ coroutines do not spark joy (2021) (probablydance.com)
63 points by signa11 5 months ago | hide | past | favorite | 51 comments



Regarding new/delete elision: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p09...

You can also specify custom new/delete operations for your co-routine for control. I am not sure if you are allowed to delete them to guarantee elision either happens or the program fails to compile.

Much like lambdas and ranged-based for loops, co-routines are pretty much defined as a code transformation into lower-level C++ rather than being black magic.

Regarding the "inline" keyword being used to move a function's body into the caller: that is a compiler hint and not mandatory. The actual purpose of the "inline" function is to allow the implementation of a function to appear in multiple translation units without that causing linking issues (provided the implementation is identical).

In terms of "I’m curious if anyone actually finds something useful to do with these.": the modern C++ Windows Runtime API is built upon async operations and co-routines.


> In terms of "I’m curious if anyone actually finds something useful to do with these.": the modern C++ Windows Runtime API is built upon async operations and co-routines.

And this answers another question from the article:

> in C++ I don’t know who [coroutines] are for, or who asked for this…

And the answer is Microsoft. They already had an extension for coroutines in their compiler before the standard, the standard is not very different from their original implementation (albeit there are differences), and you can almost always see at least one @ms address in all the discussions.


My understanding is that the extension in MSVC was done specifically as a proof of concept for the standard proposal. In fact the authors also implemented it in clang as well.

This is quite common. Especially for complex proposals, the committee wants field experience or at least proof that it is implementable before standardization.


Yes, they are incredibly useful and performant in real time state machines. One of the best features of the language in certain situations.


Weird that the yield statement is conflated with coroutines instead of iterators/generators. Even weirder to say its a "C# coroutine" feature when there's no coroutine in the C# language. It does look pretty similar to the yield keyword for C# iterators.

Unity (and probably others) use yielding iterators as a somewhat hacky way to make coroutines, though. I wouldn't say its a good role model but it gets the job done.

> Give me access to the generated struct. Allow me to put it on the stack of another function. Or as a member of another struct. Then I can store it on the heap if I want, but don’t force me.

As for why they're allocated on the heap...seems like the stack would be a foot gun, no? To write a safe coroutine, you need to hope your caller doesn't blow out the stack between iterations or just defensively put everything on the heap yourself anyway. Enforcing safety seems like the right call but that's a matter of taste, I suppose.

This does remind me of a recent C# change to the Task<T> promise type returned by async/await functions. Before they were only heap allocated because you need that to be able to properly await them multiple times. This is a major reason you can't use them in games and use things like Unity's Coroutines. As it turns out a lot of small heap allocs suck and stack allocation does come in handy for perf. Now we have stack allocated Task<T> types as well. It would be a nightmare if the unsafe was the default though... Why is life so complicated?


> Weird that the yield statement is conflated with coroutines instead of iterators/generators.

The concept of coroutines (Melvin E. Conway in 1963; who was also the source of FORK/JOIN) predates by a decade that of iterators/generators (which come from Alphard and CLU).

The term "yield" had already been used for many years for coroutines before being used for iterators/generators.

Usually the use of "yield" is not ambiguous, because the context is clear.


To go full circle, before python had built-in async/await, it was emulated (in twisted applications for example) via yield.


A C# iterator is a coroutine. A coroutine is a simply a function that can be suspended at a given point and resumed later. Iterators do exactly that at the point where you use "yield return".

A C# async method is also a coroutine.


My point is mostly that neither is called a coroutine in the language and if you Google C# coroutine you'll see mostly Unity results which are not official and may or may not be what the author was referring to.


C# 8 introduced async generators with IAsyncEnumerable, so you can combine yield and await in C# as well. Those async generators are consumed using the ”await foreach” statement (which offcourse can be used nested from within another async generator).

https://learn.microsoft.com/en-us/archive/msdn-magazine/2019...


Not only there coroutines in C#, the machinery with structural typing that Microsoft proposed to C++ coroutines' design is largely based on how they work.


Just to comment on this point:

> Enforcing safety seems like the right call

It certainly is, however it is hard to argue this point while looking at other more or less recent features:

string_view is basically a fancy "char *", with all the same use-after-free issues (same with other pointer range types and iterators)

Lambas with capture-by-reference can be freely copied and type erased into function<>, so obviously they did not care about safety when that feature got included


To really make string_view sane and safe, you'd have to make std::string reference counted (with each string view holding a ref), but I can't see that being popular.


string_view (like span) is meant for spatial memory safety, C++ doesn't help much with temporal memory safety.

That said, span (unlike string_view) fails at its intended purpose, because C++ just loves to be a total clown show.


Where you getting this information that tasks and IEnumerator can’t be used for games? I’ve been using Unity for 10 years and tasks and Ienumerators are used as coroutines all the time. Recently worked on Hello Kitty Island Adventure and almost all game logic goes through an async coroutine library. Most memory worries are in graphics and object overhead, not coroutine stack allocations.


I mean you don't want to use a Task for per frame work like and use a coroutine instead because tasks are pretty heap allocation heavy. When did I ever mention ienumerator?


IEnumerator is how Unity implements coroutines.


> Weird that the yield statement is conflated with coroutines instead of iterators/generators.

I think the conflict is between "yielding another value" and "yielding control back to the scheduler". Both of them use "yield" but they're different types of yield, and people use the same keyword for both, and either one can be implemented in terms of the other, and so on.


They're not different though, right? Just two ways to think about what it means to return, no? How can you return a value without returning control to read that value? They're the same.

Still, I don't think iterators and coroutines are necessarily the same. You can imagine a coroutine that doesn't return a value and only happens to share a thread or something.


Regardless of the underlying mechanism that make them both work, programmers generally have a vastly different conception of yielding (that is, 'producing') a generated value as opposed to yielding (that is, 'ceding') execution time to the OS or a virtual machine.


Right, difference between yield as in "crop yield" and yield as in "yield right of way".


That's the thing, since either one can be implemented in terms of the other, people conflate iterators/generators with coroutines as you've mentioned because they're easily interchangeable.


This is functionally identical to yielding a dummy value. If the language has a unit type (e.g. a 0-ary tuple) then you just use a value of that type.


> This is a major reason you can't use them in games

Because of the performance? Or for another reason?


For anything that requires consistent low-latency you must be carefully to never do anything that allocates in the render loop. Even today, on powerful PCs, allocating just a few bytes risks degrading your performance massively especially when your users have contention (e.g. not just the game or media app running but also a stream, a music player, a web browser, random services...) as any call that could end up calling into the OS increases the chances for your thread to be context-switched


I often hear this, but in my experience of game design it’s simply not true — no modern game engine aims to do no allocations per frame. Lots of games are written in the scripting languages of various engines, which are all GCed.

The days of avoiding malloc in games is long since passed. Of course, fewer allocations, like reducing any work, is always better.


In a case like Unity, low latency stuff can be moved to the job/burst system. Coroutines are generally for gameplay logic. Things that need to be done overtime in a single context.


The yield keyword was also used in green-thread coroutine / cooperative multitasking systems simply meaning 'yield control back to the task scheduler'.

> Even weirder to say its a "C# coroutine" feature when there's no coroutine in the C# language.

Unity has a coroutine system based on iterators, maybe that's where the confusion is coming from:

https://docs.unity3d.com/Manual/Coroutines.html

...or simply because async/await, coroutines or any other multitasking system serves the same purpose for the user: to kick off background tasks.


Discussed at the time:

C++ Coroutines Do Not Spark Joy - https://news.ycombinator.com/item?id=29064233 - Nov 2021 (250 comments)


Coroutines are low-level building blocks for various asynchronous models.

C++23 introduces std::generator which is roughly similar to Python and JavaScript generators, implemented on top of coroutines. If that model fits your application, you don’t need to mess with the coro API directly.


Also plenty of other libs provide a neat API for c++20 coroutines: boost.asio, boost.cobalt, qcoro...


Wow, that API looks pretty awkward and unpleasant, with the co_yield, co_return, etc. Golang has spoiled me.

Edit: @gloryjulio I'm sure it works fine, it's just so unergonomic to express compared to Golang. But this is nothing new with C++, it can do anything.. might not be pretty, but it can do it.


We only use co_await/co_return in the production code. It's basically like JavaScript promise and it's very easy to work with.

Calling coroutine in non-coroutine code is also easy, just do blockingWait(co_routineFunc());

We almost never use multithreading code in the business layer now unless coroutine is actually causing performance issue. I haven't seen that happen yet


With Golang you pay the price of having to go through cgo whenever you want to invoke into C. Which is why Go devs are so stubbornly resisting linking to libc even on platforms where direct syscalls are unstable, for one.

The C++ approach is fully compatible with the C ABI, since in the end it's all just callbacks once you strip all the abstraction layers. Which also means that other languages that use this approach (among them C#, Python, and JavaScript) can easily perform async calls across the interop boundary; this is actually used on Windows, with the OS projecting a common set of async APIs to various languages.

With Go, if you want to do async with some code that's not yours, it has to also be written in Go.


same. after using golang there is no going back for me. the new added features to c++ just make it more unappealing for me. less is more.


co_<whatever> is now a recurring meme in the c++ community:).


I mean goroutines are not exactly coroutines nor are they threads they are somewhere in between.

And its pretty easy to use the same workflow as goroutines in C++ using std::thread and a concurrent queue implementation, tbb has a decent one and moodycamels is also very good.

It's actual async/await that is a touch annoying but it's also fundamental building blocks, there are currently few abstractions ontop of it.

I also believe strongly async/await is an antipattern. There are better models to handle concurrency well imho that don't use function colouring. Golangs goroutines and channels (effectively actors but that's a can of worms) are a great solution and probably what you should be reaching for instead of async/await.


Coroutines can be divided into two categories, stackless and stackful. goroutines are stackful coroutines.


Except goroutines will absolutely spawn a thread which a coroutine by definition wont.

Goroutines are parallel, coroutine are concurrent. Those are not the same thing.

Coroutines co-operatively yeild to other coroutines. Parallel constructs run in parallel at the same time. Goroutines can run in parallel at the same time and thus are not a coroutine.


My understanding is that goroutines don't spawn threads per-se. They can be executed on parallel event loops, but that can be true for stackess coroutines in c++ for example.


Yeah goroutines are a pretty high level abstraction but they are definitely closer in concept to green threads than to coroutines.

You can run a C++ coroutine on a seperate thread but you effectively need to build all the task handling logic yourself of use a library that's build that out. Goroutines you don't need to use a specialised sleep fuction which is actually changing coroutine state, setting a timer and yeilding. You just sleep as an example


My understanding taken from someone smarter than me is that the issues with coroutines in C++ can be worked around for a good and efficient implementation.



Thank you for this pointer. I have been studying coroutines off and on for a while and my google/ddg foo is weak and did not find this. Or maybe google/ddg are getting weak. Sadness. I was using David Mazières' tutorial[1] but was still puzzled about the seemingly magical promise_type/coroutine_handle conversions that are nicely explained here.

[1] https://www.scs.stanford.edu/~dm/blog/c++-coroutines.html


I wouldn’t be surprised if the C++ coroutine push was for a perf. packet in some big tech company. Promo is hard.

Meanwhile, the same optimizations one could use for efficient coroutines could just as well be applied to fibers, which are much nicer.


The semantics of fibers are difficult to define precisely. For example using thread local storage inside fibers can be problematic if a fiber sleeps in one thread and restarts in another, because the compiler is free to store something like &errno and use it across function calls. (And this is not theoretical—compilers will do this).


You see this exact problem in task based libraries like task based parallel libraries such as taskflow. It's easy to get burned using thread local variables if you are not careful because a task might yield to the scheduler which runs another task ( of the same type ) on the same thread and then your thread local is corrupted.

You end up requiring more sophisticated object pools where you can check out and check in objects.


What bonzini is referring to is actually a more subtle issue: the compiler will CSE the (hidden) thread_local address calculation even across function calls. So even if you are careful and do not assume that thread_local state is preserved across function calls, your code can still be wrong as suddenly it will be accessing a thread_local owned by another thread. That might be as simple as dereferencing the wrong errno.

It is very hard to workaround that in your code. The only practical solution is to never migrate coroutines to other threads.


Well, it would be nice if, optionally, GCC didn't do that. I believe that MSVC has a flag to prevent exactly this optimization.


Stackless coroutines are useful in domains where stackful coroutines are simply not an option. Not every application runs on a server. There are tradeoffs associates with stackless coroutines, but I don't think it's questionable whether it was the right choice for a language like c++. If you want stackful coroutines, there are likely better suited languages for your domain that are not c++.


They were initally proposed by Microsoft, based on the way C# coroutines work on top of async/await, and the asynchrounous runtime used in .NET Native and C++/CX for WinRT.

Search for talks from Gor Nishanov.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: