Coroutines in C (2000)

DriftRegion · 2024-02-25T17:10:59.000000Z

I've found myself at this webpage multiple times while trying to minimize the complexity of APIs in my C projects. I think it does a lovely job explaining control flow and it has helped me to think more explicitly about storage of state on and off the stack as well as the readability consequences of different approaches.

My conclusion for now is that the choice to use C coroutines is best left to the library user. For example: Mongoose (https://github.com/cesanta/mongoose) uses event callbacks to deal with asynchronousness. It is much more pleasant to wrap a library like this in whatever thread/task primitives your system has rather than try to port the mythical cross-platform C couroutine or worse, std::thread.

chongli · 2024-02-25T18:26:41.000000Z

It’s Simon Tatham’s website. He’s well known for being the author of PuTTY [1] and his puzzle collection [2]!

[1] https://www.chiark.greenend.org.uk/~sgtatham/putty/

[2] https://www.chiark.greenend.org.uk/~sgtatham/puzzles/

hnfong · 2024-02-25T18:50:33.000000Z

I've known about the two projects for literally 20+ years, but wow I never knew it was the same person behind them....

smrq · 2024-02-27T16:53:17.000000Z

Slight divergence of topic here, but is there a word for this phenomenon? Namely, when you know of a person through two very different channels and find out they are the same person. It happens to me frequently enough that I feel like it deserves at least a neologism of some sort, but I don't have anything catchy.

(I find it especially happens with bands; I had listened to both Failure and Autolux for years before finding out they shared a guitarist, for instance.)

abhgh · 2024-02-25T18:56:51.000000Z

Oh wow... I have had the Android port of his puzzles (your second reference links to it) on my phone for a while. Had no idea the developer of Putty had anything to do with it!

utopcell · 2024-02-25T18:40:39.000000Z

Coroutines. What a lovely concept! It's a joy to watch all the CppCon videos about C++ coroutines, primarily by Microsoft folks. "Negative-cost abstraction" is such a nice hook phrase.

Friends at Meta mentioned to me a couple years ago that they started using c++ coroutines, which ended up being a big mistake because they had to face compiler implementation bugs, which must have been nasty to track down. At Google, we are eagerly waiting for the brilliant folks that are working on properly integrating them in google3/ to tell us when the time has come to use them.

This article uses Duff's device [1] to motivate structured gotos via macros as an implementation strategy for C coroutines. Duff wanted to loop-unroll this:

    do {
        *to = *from++;
    } while (--count > 0);

which he did in this way (shortened for brevity) :

    int n = (count + 3) / 4;
    switch (count % 4) {
    case 0: do { *to = *from++;
    case 3:      *to = *from++;
    case 2:      *to = *from++;
    case 1:      *to = *from++;
            } while (--n > 0);
    }

That is to say, he realized that he could use `case` statements (almost) anywhere in a `switch` block. The connection with coroutines is simple: One can wrap the whole function body with a switch statement, use a static variable for holding the location of the latest coroutine return, and label all co-returns with a `case` statement:

  #define coBegin static int state = 0; switch (state) { case 0:
  #define coReturn(x) do { state = __LINE_; return x; case __LINE:; } while (0)
  #define coFinish }

  int function(void) {
      static int i;  // function state can't be local anymore.
      coBegin;
      for (i = 0; i < 10; ++i)
          coReturn(i);
      coFinish;
  }

Sustrik's take on C coroutines might also be an interesting read [2].

[1] https://en.wikipedia.org/wiki/Duff%27s_device

[2] https://250bpm.com/blog:48/index.html

codemac · 2024-02-25T21:56:25.000000Z

As someone who moved from google3 -> fbcode in the last few years, I think there are weird upsides AND downsides to having async code littered through your C++ (aka co_yield, co_return, co_await, etc).

The advantage, compared to the internal stuff google3 was using, was that as you read code, the async nature of various parts was obvious. Some programmers at G would spend entire quarters+ not knowing what the threading model was, and cause serious bugs in retrospect.

The disadvantage is actually much dumber - a lot of code "could" be async, and over time becomes entirely async because that's the mode the programmer is in when writing the program.

The choice to use a spinlock vs. a mutex w/yields should be one based on the size of the critical section and the threading going on at the time. Unfortunately to make code more readable/uniform/etc you end up with entire projects doing one or the other.

I'd love to learn more about language implementations of threading that do not default either way, but instead could take a profile of the previous run, and make the next run more optimal, without having to change the code or causing bugs.

dividuum · 2024-02-25T19:00:40.000000Z

The alternative is to use the „labels as values“ feature of GCC. You can take the address of a label and later jump to it. I contributed the code that’s now in lc-addrlabels.h back in 2005 :-)

I also used the GCC local labels feature to completely avoid using __LINE__ anywhere, so you could have multiple coReturns in a single code line:

#define LC_SET(s) do { ({ __label__ resume; resume: (s) = &&resume; }); }while(0)

bxparks · 2024-02-25T20:16:16.000000Z

Definitely, "labels as values" (aka "computed gotos", https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html) is so much better than Duff's device.

Unfortunately, computed-gotos is not a C language standard. I don't understand why. I think FORTRAN had it in the 60s. It is so useful in some situations, like a coroutine, or a byte-code interpreter. Is it because some obscure DSP chip with a sizeof(char)==32 using 1's complement arithmetic can't support it? Then maybe make it implementation-defined and allow the rest of the world get nice things.

mananaysiempre · 2024-02-26T02:05:12.000000Z

(For ease of reference—Fortran calls this an assigned GOTO: jump to label stored in a variable—as an integer number, as is Fortran’s way, not an address. A computed GOTO in Fortran is more like a switch statement in C: jump to the first label listed in the statement if the specified variable is one, to the second if it is two, ..., fall through to the next statement otherwise.)

weinzierl · 2024-02-26T09:07:53.000000Z

Steve Wozniak's Integer BASIC (aka Apple BASIC) had computed-gotos too. Not sure about Microsoft BASIC, but Commodore BASIC had definitely lost this feature.

mananaysiempre · 2024-02-26T01:58:19.000000Z

> [Duff] realized that he could use `case` statements (almost) anywhere in a `switch` block.

That’s likely true, in that it probably was a moment for realization for Duff (and many others reading him, including me); yet it’s almost certainly a completely intentional feature.

(As mentioned at the bottom of TFA, Duff also realized you could build coroutines on top of it but thought the idea “revolting”.)

There’s a temptation to think of C’s `switch` as a very inexpressive pattern match, and then the “fallthrough” seems like a bug and so on. It’s not. It’s a computed GOTO, in the vein of the one in Fortran but more convenient in that the values don’t have to be sequential, and also in that you don’t have to list all the labels at the top. (In fact, now that I’m writing this out, it’s more of a computed COMEFROM, then, isn’t it? However insane that sounds.)

Scubabear68 · 2024-02-25T20:55:08.000000Z

Ah the C pre-processor, the gift that keeps on giving after all these years :-(

agumonkey · 2024-02-26T01:53:16.000000Z

I don't mind macro heavy C code, but this one made me freeze

benlivengood · 2024-02-25T20:47:14.000000Z

Oh come on, just rewrite it all in Go! It should only be a few billion line CR. Your SREs will thank you (eventually).

omoikane · 2024-02-25T17:07:46.000000Z

> no commonly used high level language supports the coroutine

This might have been the case back in 2000, but these days many languages do support it, including C++20, Lua, Python, Ruby, etc.

esfandia · 2024-02-25T23:17:02.000000Z

Python was created in 1991; I imagine the "yield" keyword appeared either right then or not much later!

Also, the refinement at the end of the article: "We arrange an extra function parameter, which is a pointer to a context structure; we declare all our local state, and our coroutine state variable, as elements of that structure." sounds like implementing a closure to me. You make the callee a lambda which would use an outside var/context/state to determine what to do or with what value. Am I understanding this correctly?

kragen · 2024-02-26T03:09:20.000000Z

your note about closures is correct, yes

as lmm pointed out, python didn't have generators and yield until 2.2. icon, which tim peters adapted the idea from, had them quite a bit earlier than that, but i think it's reasonable to describe icon as not being a commonly used language, then or now

(python's generators are closer syntactically to icon's generators than they are semantically)

lmm · 2024-02-25T23:50:22.000000Z

> Python was created in 1991; I imagine the "yield" keyword appeared either right then or not much later!

Nope. It was introduced 10 years later, as part of PEP 255, released in Python 2.2.

perbu · 2024-02-25T19:47:49.000000Z

fwiw, Simula67 had coroutines. Not the first to do so, but IIRC it was the first major language to do so.

mid-kid · 2024-02-25T19:59:39.000000Z

The "switch" method isn't too uncommon, but usually people have an init function and "state" pointer that's passed into the coroutine function. I've used this method a lot in embedded projects, where one coroutine was handling motor acceleration/deceleration while the other would simply tell it what direction to go, but I've also used it for networked libraries[1]. Even the standard library has a coroutine function like this in "strtok()"[2]

You don't really need to introduce macro hell for it to be manageable, though I've never found reading switch/case flow to be very enjoyable.

[1]: https://github.com/REONTeam/libmobile/blob/master/relay.c#L3...

[2]: https://manpages.debian.org/bookworm/manpages-dev/strtok.3.e...

pieterr · 2024-02-25T16:59:44.000000Z

From the same author: Simon Tatham's Portable Puzzle Collection

https://www.chiark.greenend.org.uk/~sgtatham/puzzles/

mtlmtlmtlmtl · 2024-02-25T19:35:58.000000Z

If you think this is some C black magic, try reading this by the same author on creating arbitrary control structures with macros: https://www.chiark.greenend.org.uk/~sgtatham/mp/

o11c · 2024-02-25T19:59:51.000000Z

Note that the underscore prefix thing often is still prone to shadowing. You need pretty ugly mangled names to avoid that, and for external-block macros (unlike expression-ish/statement-ish macros) it can't be avoided with GNU's/C23's hygienic macro hack.

lifthrasiir · 2024-02-26T05:50:24.000000Z

Wait, was there any recent change to C23 that enabled a different solution than `__COUNTER__`? I realized you have mentioned but didn't fully define `CLEANSE_MACRO_VARS` in recent comments, is there any other pointer?

o11c · 2024-02-27T00:52:18.000000Z

Yes, implementing `CLEANSE_MACRO_VARS` was left as an exercise for the reader. The real key improvement here is that C23 standardized C++-style `auto` (or GNU C `__auto_type`), though its improved support for variadic macros also helps a few corners.

The key observation is that, for expression-like (only with GNU statement expressions) or statement-like (using do-while, possibly emulating expression-like macros by specifying an output variable) macros, shadowing is perfectly okay as long as it doesn't happen before the evaluation of the macro argument expressions.

So what you do is define all the variables twice - first, with long and ugly names (which name is only generated/used deep inside the cleansing macro), to capture the expression, and then (after all the ugly definitions are done, not just some), as a simple copy of that with a nice name for use in the user's macro. So the resultant macro works like:

    #define MAX(a_, b_) // omitting backslashes
    ({
        auto _ugly_a = (a_);
        auto _ugly_b = (b_);
        // it's safe if the above arguments expand to contain unrelated `a` and `b`.
        auto a = _ugly_a;
        auto b = _ugly_b;
        // in a statement expression, the last statement is the resulting value
        a > b ? a : b;
    })

There's no need to `__LINE__` for this particular problem, since all you need is a sufficiently long unique prefix to namespace it (which all C libraries assume anyway).

=====

Note also that the old `__typeof__` is not useless just because `__auto_type` exists. Besides things like `_Generic` and function definitions where there's no initializer allowed, it's also useful for safely forming pointers to a passed-in type which might be (a pointer to) an array or function, since, like `sizeof`, types are also valid for its argument. This is in fact in the documentation if anybody reads that.

lifthrasiir · 2024-02-27T00:55:53.000000Z

Ah, I see. I thought the cleaning involved something like gensym, but you just meant that a new scope can introduce a variable of the same name without interfering with the original. You don't really forbid something like `MAX(&_ugly_a, &_ugly_b)` for example, that was why I was confused.

mtlmtlmtlmtl · 2024-02-26T00:34:35.000000Z

I wouldn't recommend doing any of this stuff at all, personally. It's just always amazed me though how much you can do with with just basic string substitution and no homoiconicity/no AST access.

rwmj · 2024-02-25T17:04:34.000000Z

Coroutines are fun, but in real code please consider using actual threads. Modern processors have many cores, but coroutines will (often) only use a single core.

Edit to add: This is a real world problem too. Until recently qemu, which extensively uses coroutines, would put a lot of its block device I/O through a single thread. This caused some performance issues. Kevin Wolf and others have spent years of effort fixing this so modern qemu will use many threads for I/O (this work will appear in RHEL 9.4).

samatman · 2024-02-25T21:16:45.000000Z

The only connection between threads and coroutines is that some single-threaded language runtimes only have coroutines, so you might occasionally use them where threads would be a better choice.

Coroutines are a way of structuring single-threaded execution, and a useful one. The example in the Fine Article of a producer-consumer pattern is a good one, attaching a stream to a parser isn't a parallel algorithm so threads are useless for writing it.

Naturally, using a single-threaded paradigm for work which could be performed in parallel is inefficient, but coroutines aren't a poor man's parallelism, they're a control structure which functions on its own terms. They can be combined productively with threads, such as using an event loop in a web server to thread (as in needle) coroutines through various blocking events with a dispatcher, and the runtime can spin up a thread per core to parallelize this, which reduces per-thread coordination to checking the depth of each thread's work queue and farming the request to the least congested one.

mananaysiempre · 2024-02-26T01:48:21.000000Z

Bob Nystrom makes this argument best, I think, in his two-parter on loops and iteration[1,2]. Looping over data structures is of course only one example of how one can apply coroutines, but a very established one. The canonical problem requiring coroutines[3] is also essentially about doing that.

Or for those who want something different there’s the elevator (and elevator-userbase) simulation from TAoCP volume 1, also an essentially concurrent problem with little to no parallelism or I/O to it.

[1] https://journal.stuffwithstuff.com/2013/01/13/iteration-insi...

[2] https://journal.stuffwithstuff.com/2013/02/24/iteration-insi...

[3] https://wiki.c2.com/?SameFringeProblem

Thorrez · 2024-02-25T22:18:39.000000Z

> attaching a stream to a parser isn't a parallel algorithm so threads are useless for writing it.

Couldn't it be done in 2 threads? The output of the decompressor thread feeds to the input of the parser thread.

mananaysiempre · 2024-02-26T01:36:26.000000Z

It could be, but given the sometimes astonishing costs of the—effectively—network protocol we know as cache coherency (thousands of cycles if you’re not careful), it’d be a giant waste in many of the cases where stackless coroutines would be perfectly appropriate.

ajross · 2024-02-25T17:29:04.000000Z

> coroutines will (often) only use a single core

That's generally the desired behavior. If you have decoupled, parallel workloads they're going to naturally be working on disjoint data. The idea behind coroutines is that you have some kind of local workload with synchronous data that, for whatever reason, is easiest to express "inside out" with a function that gets to loop over something and "push" the results to its abstracted consumer whose code lives somewhere else, vs. the natural functional paradigm where the inner loop is a conceptual "pull" controlled by the caller.

repelsteeltje · 2024-02-25T18:24:10.000000Z

Thank you for eloquently expressing an observation I probably should have learned years ago.

pengaru · 2024-02-25T18:07:26.000000Z

There's often a sweet spot to be had in mixing threads and coroutines, where you have a coroutine scheduler instance per thread, and a thread created per core.

Then rarely, if ever, migrate coroutines across schedulers, and rarely, if ever, share data between coroutines on different schedulers.

Coroutines can enable an ergonomic concurrent programming style while avoiding the need for any locking at all via cooperative scheduling. You generally end up with higher scheduling latencies, but potentially quite high throughput by removing any need for atomics/locking overheads, and no timer constantly interrupting execution for preemptive scheduling.

rwmj · 2024-02-25T20:11:53.000000Z

Right, that's what qemu has ended up with.

c-smile · 2024-02-25T18:19:08.000000Z

> please consider using actual threads.

Bad advice in general.

Why would you run separate thread if you only want is to iterate over nodes in a tree (as an example of non flat collection).

cylon13 · 2024-02-25T18:29:01.000000Z

It’s never bad advice to consider something.

klyrs · 2024-02-25T18:49:07.000000Z

To the contrary, consideration takes time, and rules of thumb are valuable to mitigate overthinking.

rstuart4133 · 2024-02-26T21:53:28.000000Z

"Consider using threads" is only safe advice if the person doing the considering knows how to dealt with the (usually unwanted) non-determinism threads introduce.

Only a small fraction do, but threads look so simple on the surface the rest don't realise they are walking into a mine field.

a1369209993 · 2024-02-25T19:24:13.000000Z

No, it's frequently bad advice to consider something. See eg https://www.xkcd.com/1445/.

rwmj · 2024-02-25T20:13:02.000000Z

Real world and toy examples are very different. The example isn't like what people are using coroutines for in the real world. I'd urge you to look at how coroutines are used for inversion of control (quite correctly) in qemu.

hgs3 · 2024-02-25T19:39:13.000000Z

> Coroutines are fun, but in real code please consider using actual threads.

Coroutines are lightweight and trivial to synchronize. They are perfect for small bits of incremental computation, like iterators and tokenizers. Maybe you're thinking of green threads?

flohofwoe · 2024-02-26T10:12:07.000000Z

I agree for typical async IO code which needs to wait for external events (like an IO operation to finish), but sometimes in other situations threads are not an option because there would be too much synchronization required.

For instance in my emulators, the CPU emulation is a switch-case state machine which is very similar to the coroutine approach described in the article, trying to move this idea to threading would require a synchronization between multiple threads on each emulator clock cycle which is somewhere between a few dozen and a few hundred host CPU clock cycles. That's not realistic, at least for emulating typical 8- and 16-bit home computers. For emulating 'modern systems' where the hardware components are not as tightly coupled as in old-school 8- and 16-bit machines, threading makes more sense though.

See here to get an idea how that CPU emulation works (only the first few sections are needed to understand the concept): https://floooh.github.io/2021/12/17/cycle-stepped-z80.html

narag · 2024-02-25T21:06:35.000000Z

Coroutines are fun, but in real code please consider using actual threads. Modern processors have many cores, but coroutines will (often) only use a single core.

Threads and coroutines have different purposes. Coroutines are more about logical structure.

DinaCoder99 · 2024-02-25T19:28:38.000000Z

That seems like an orthogonal concern to structuring control flow, though it is much more difficult if you intend to use coroutines across multiple threads. There's nothing stopping you from using both threading and coroutines.

tiberius_p · 2024-02-25T22:37:28.000000Z

Coroutines are good for modelling concurrency which is different from parallelism. Concurrency is useful for abstraction and expressiveness. Parallelism is useful for making your code run faster by running parts of it in parallel on multiple cores. You could make concurrent programs run faster on multiple cores by distributing the coroutines which don't share state on multiple working threads in a thread pool, thus mixing concurrency and parallelism...but they are still two different things with different purposes.

Quekid5 · 2024-02-25T17:09:34.000000Z

Not just that, but the scaling problems with threads are usually massively overstated. It's true that thread switching has quite a bit more overhead, but it's been optimized a lot since the bad old days of 15+ years ago. (Plus, unless you're using a massive number of threads it's very unlikely that thread switching is going to be your bottleneck.)

jeffreygoesto · 2024-02-25T17:58:20.000000Z

Unless you're on QNX 7 of course...

Quekid5 · 2024-02-28T23:01:09.000000Z

Something something exception that proves the rule?

I'm not familiar with QNX other than knowing that it is a an RTOS... which I imagine imposes some constraints that complicate things quite a lot?

jeffreygoesto · 2024-03-02T13:10:53.000000Z

It runs in millions of cars and other embedded systems, not that niche tbh. Your "exception" is the rule for a lot of engineers.

But yes, they took pride that process switching is as efficient as thread switching which is a slick way of selling that they never optimized threads to be better, unlike Linux. Until QNX8 there is also the "big kernel lock"...

Quekid5 · 2024-03-05T22:14:10.000000Z

> It runs in millions of cars and other embedded systems, not that niche tbh.

Linux (in some mutilated form) runs on billions of devices. Hell, even JavaME does; or at least did, not sure what current status is. That's not really a worthwhile comparison.

My 'niche' mention was in relation to number of devs who'd be familiar with it.

> Your "exception" is the rule for a lot of engineers.

Now, that's a fair comparison... but I wager "a lot" is still not nearly the number of engineers/devs that work with e.g. Linux. That's not to cast any aspersions, of course. After all it's often not really a choice of the engineer's part.

Not sure how we got to this tangent, tbh.

ot1138 · 2024-02-25T20:59:13.000000Z

This is out of the question for real time apps. Co-routines are an elegant solution to implement cooperative multitasking in such cases.

lmm · 2024-02-26T00:01:44.000000Z

Threads with implicitly shared memory are more or less impossible to use safely, particularly in a language like C. Coroutines for concurrency, and multiprocessing with explicitly shared memory for parallelism, is a better approach.

c-smile · 2024-02-25T17:25:01.000000Z

C++ version of the approach: https://www.codeproject.com/Tips/29524/Generators-in-C

I am using this in my Sciter, just in case. Works quite well and convenient.

xjm · 2024-02-27T23:11:41.000000Z

A modular and safe way to achieve this is probably effect handlers. It's like python's yield but can return a value and is scoped like an exception, it's not local to a function call. If you're unfamiliar with it, this article is a good motivation.

Each function, written in direct style, can perform an "effect" when the function wants control to go somewhere else (for c=getchar() and emit(c) here).

Control then goes to the effect handler, in this case probably the caller of the two functions, which decides what to do next: decompressor emits a char? Let's resume the parser's code with the char until it asks for more, then resume decompressor again, etc.

Effects can be efficiently implemented, especially if the continuation is only allowed to be called once (which is the case in OCaml), and allow writing code in direct style, together with type/memory safety. They are also very helpful in a concurrent setting.

An example here : https://effekt-lang.org/docs/casestudies/lexer

kccqzy · 2024-02-26T00:30:45.000000Z

> Of course, this trick violates every coding standard in the book. […] I would claim that the coding standards are at fault here.

Thoroughly disagree here. The coding standards for not at fault for rejecting this code, but rather the code is merely a cute trick. Software engineering in the large is all about removing surprises and making code readable even to the sleep-deprived on caller waking up at 3am to debug this. You can't rely on programmers remembering the ground rules all the time (and there are four of them!)

> Coding standards aim for clarity. By hiding vital things like switch, return and case statements inside "obfuscating" macros, the coding standards would claim you have obscured the syntactic structure of the program, and violated the requirement for clarity. But you have done so in the cause of revealing the algorithmic structure of the program, which is far more likely to be what the reader wants to know!

It takes skill to write programs that see clear in both their syntactic structure and their algorithmic structure. This isn't it. (I am a fan of Rust creating implicit state machines from async functions and I think that should be the model here.)

userbinator · 2024-02-26T00:41:22.000000Z

Endlessly "dumbing down" lowest-common-denominator crap is what's responsible for the quality, or lack thereof, of most software today. Shunning knowledge and education will come back to bite you.

bdjsiqoocwk · 2024-02-26T00:47:04.000000Z

Seconded. Not everything is about the sleep deprived on call guy, not everything is about reducing surprises, not everything is about operations.

I had thought before that this "what about the guy at 3am arguments" push in the direction of mediocrity - happy to see I'm not the only one have these thoughts.

kccqzy · 2024-02-26T01:16:40.000000Z

Feel free to write your artisanal code in your personal projects. But they don't belong in most companies' code base. You are a cog in the machine at these companies. Your role is to produce code that's easily understood by the next programmer. That's why coding guidelines exist.

kragen · 2024-02-26T03:11:14.000000Z

that's why all those companies are using putty instead of their own ssh client; they're organizationally incapable of writing software of putty's quality

chongli · 2024-02-26T04:30:26.000000Z

A lot of these companies don’t ship software at all, they write it only for internal use. It needs to be easily fixable by junior interns, not dependent on Bob the 100X programmer who decided to retire last week.

kragen · 2024-02-26T05:08:10.000000Z

yeah, and of course you normally want as much as possible of your software to be easily fixable by junior interns in any case; that's always better when there's no compensating drawback

the implicit premise of your comment, however, seems to be that no such compensating drawback is possible, presumably because internal-use-only software isn't a competitive advantage. there are a lot of companies that think that way, but i think it's shortsighted; see https://news.ycombinator.com/item?id=39402299 for some examples of companies that discovered that it mattered a lot how good their internal-use-only software was

furyofantares · 2024-02-26T01:12:40.000000Z

> quality, or lack thereof, of most software today.

Also the incredible quantity software today, and how dang cheap it is.

offices · 2024-02-26T11:24:10.000000Z

Having gone from C to C++, there's a stark difference between the communities in what is considered 'readable' i.e. what the future reader is expected to grok.

In C-world, ternary-ifs are too spicy and C99 is newfangled. In C++ world, the only reason you'd be pushed away from template metaprogramming is because the standard you're using lets you do it with constexpr.

DenisM · 2024-02-25T17:18:57.000000Z

Setjmp/longjump are the built-in coroutines in C, no?

fweimer · 2024-02-25T17:52:33.000000Z

Some longjmp implementations unwind the stack, so they can't be used for coroutine switching. Even if it works (it's technically undefined), you need to get a suitable stack from somewhere.

The next issue is that usually, applications want to resume coroutines on a thread different from the one it on which it was suspended. That runs into trouble because on some systems, compilers cache the address of thread-local variables in the local stack frame, assuming that the thread does not switch in a function mid-execution.

DenisM · 2024-02-25T18:26:04.000000Z

The only platform I’ve seen stack unwind was VAX/VMS :)

But yes, you do need to allocate the stack which could take up a lot of ram.

It’s odd not to mention it in the article though.

fweimer · 2024-02-25T19:13:56.000000Z

Current glibc unwinds the shadow stack if it is active: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86...

It makes longjmp useless for coroutine switching, although it does not result in other effects of stack unwinding (such as invoking C++ destructors).

On Windows, longjmp really unwinds the stack (and maybe this is something influenced by VMS): https://learn.microsoft.com/en-us/cpp/c-runtime-library/refe... “In Microsoft C++ code on Windows, longjmp uses the same stack-unwinding semantics as exception-handling code. It's safe to use in the same places that C++ exceptions can be raised.”

DenisM · 2024-02-25T19:57:13.000000Z

Well, things have changed since I looked last. Thanks for explaining.

FWIW, back in the nineties we just wrote our own setjmp/longjmp for VMS to avoid stack unwind - save registers / restore registers. We used it to implement coroutines in Modula 2, iirc.

josephcsible · 2024-02-25T20:29:25.000000Z

No. The C standard says this about longjmp: "if the function containing the invocation of the setjmp macro has terminated execution in the interim [...] the behavior is undefined". So while you can longjmp out of functions, you can't longjmp back into them.

ajross · 2024-02-25T17:34:59.000000Z

You can absolutely build coroutines out of a generalized context switch. So yes, in some sense. But note that the linked article doesn't use setjmp/longjmp, which is what makes it so clever.

FWIW: would I personally actually use this trick? Almost certainly not. C APIs aren't well suited to that level of abstraction IMHO, if you have an app that needs it leave the C stuff to the stuff C is good at and wrap a C++ or Rust or whatever layer on top for the subtleties.

dkjaudyeqooe · 2024-02-25T17:33:29.000000Z

These are stackless coroutines, if you use longjump you have to create a stack for the coroutine.

There are pros and cons for each style.

trealira · 2024-02-26T02:14:57.000000Z

In theory (but only possible in assembly right now), there could be coroutines that shared the stack of their caller. As long as the caller (who's calling from a normal function) finishes calling the coroutine and doesn't expect to be able to call it after they return, then you could use it to implement iterators, e.g. over a binary tree or a hash table, like generators in Python. It could work as long as the caller used the stack frame base pointer to refer to their saved local variables, since the stack pointer could be changed between yields to the coroutine. I'm genuinely surprised there hasn't been a compiled programming language to do that other than Sather and CLU[0] (both of which are long dead by now). Graydon Hoare originally wanted them in Rust [1], but LLVM didn't support it, so it was scrapped.

[0]: https://dl.acm.org/doi/pdf/10.1145/800127.804079 (the third PDF page, page 125)

[1]: https://graydon2.dreamwidth.org/307291.html (search "non-escaping coroutine")

crabmusket · 2024-02-26T03:02:40.000000Z

I was so sure this was about protothreads til I remembered its name. https://dunkels.com/adam/pt/

vinay_ys · 2024-02-25T18:12:24.000000Z

Ah, this page again! it's been more than two decades? since I saw this page last? It was fun to learn about coroutines from the author of PuttY the ssh client of choice on Windows those days.

Levitating · 2024-02-27T15:23:39.000000Z

There is an RFC[1] for coroutines in rust: https://doc.rust-lang.org/beta/unstable-book/language-featur...

[1]: https://github.com/rust-lang/rfcs/pull/2033

jonhohle · 2024-02-26T01:36:35.000000Z

I’ve used libaco in the past for coroutines in C. I found zlib a pain to use when using curl scheduled with libuv to fetch data. zlib expects a read loop to extract data, but libuv provides an evented push model. Saving all of the zlib state and building a state machine seemed tedious, but a coroutine made the zlib code look like the standard, blocking loop.

This was just code for my own amusement, and maybe used by a few people, for non-production work. I’d do it again, however, if I needed to.

rurban · 2024-02-25T20:38:43.000000Z

Also related, the C++ lambda fuckup: https://news.ycombinator.com/item?id=33084431

dang · 2024-02-25T19:18:49.000000Z

Coroutines in C (2000) - https://news.ycombinator.com/item?id=36639879 - July 2023 (2 comments)

Coroutines in C - https://news.ycombinator.com/item?id=23293835 - May 2020 (1 comment)

Coroutines in C (2000) - https://news.ycombinator.com/item?id=19106796 - Feb 2019 (59 comments)

Coroutines in C, revisited - https://news.ycombinator.com/item?id=13199245 - Dec 2016 (36 comments)

Coroutines in C - https://news.ycombinator.com/item?id=13138673 - Dec 2016 (1 comment)

Coroutines in C (2000) - https://news.ycombinator.com/item?id=11051004 - Feb 2016 (11 comments)

Show HN: Libconcurrent – Coroutines in C - https://news.ycombinator.com/item?id=10887071 - Jan 2016 (24 comments)

Coroutines in C with Arbitrary Arguments - https://news.ycombinator.com/item?id=9402314 - April 2015 (22 comments)

Coroutines in C (2000) - https://news.ycombinator.com/item?id=8615501 - Nov 2014 (27 comments)

Coroutines in C (2000) - https://news.ycombinator.com/item?id=6244994 - Aug 2013 (1 comment)

Coroutines in one page of C - https://news.ycombinator.com/item?id=6243946 - Aug 2013 (60 comments)

Coroutines in C (Simon Tatham, 2000) - https://news.ycombinator.com/item?id=1380044 - May 2010 (16 comments)

Coroutines in C - https://news.ycombinator.com/item?id=835849 - Sept 2009 (16 comments)

Co-routines in C - https://news.ycombinator.com/item?id=794157 - Aug 2009 (1 comment)

anfilt · 2024-02-25T22:28:01.000000Z

Bunki, a C Coroutine library https://news.ycombinator.com/item?id=35133095

paulddraper · 2024-02-25T17:09:33.000000Z

(2000)

gkbrk · 2024-02-25T17:21:15.000000Z

I've used this for some embedded/IoT projects before. They work really well.

anfilt · 2024-02-25T22:24:04.000000Z

I honestly like stackful coroutines if you don’t mind allocating memory for a stack.

https://github.com/Keith-Cancel/Bunki

whiterknight · 2024-02-25T21:40:15.000000Z

UNIX pipes solve this problem. Both reader and writer are driving their respective process.

esfandia · 2024-02-25T23:03:35.000000Z

The article says: "In many modern operating systems, you could do this using pipes between two processes or two threads. emit() in the decompressor writes to a pipe, and getchar() in the parser reads from the other end of the same pipe. Simple and robust, but also heavyweight and not portable. Typically you don't want to have to divide your program into threads for a task this simple."

whiterknight · 2024-02-25T23:06:14.000000Z

Thanks. I have read this article a few times and somehow missed that was acknowledged.

“Heavyweight” is where I disagree. It’s exactly whats needed to be able to write sequential code on each side.

jijji · 2024-02-26T05:15:17.000000Z

how does this compare to using Go goroutines?

anymouse123456 · 2024-02-25T17:00:02.000000Z

I assume I'm missing that this a joke, it's honestly hard for me to tell.

But in the conclusion, the author talks about actually making this work by providing a context object to hold all of the intermediate state and providing this context object to the callee.

Once this is required, how does this approach compare to simply using an external iterator?

Seems to me like an iterator solves the lion's share of the problem here. It moves the state into the caller's stack (or above them), it's easy to understand, simple to implement and doesn't involve unenclosed and context-dependent macros.

btilly · 2024-02-25T18:20:18.000000Z

Why would you assume that this is a joke?

C (particularly back when this was written) was a low level language. You could not simply use an external iterator - they didn't exist. And if you try to roll your own, you'll wind up dealing with a lot of complications around resource management in a language which lacks basic memory management.

But the proof is in the pudding. Back then it was common to want to telnet into a Unix machine from Windows. And the only two solutions that worked well enough to consider were installing Cygwin, or installing PuTTY. Cygwin was better if you needed a Unix environment on your Windows machine. Otherwise PuTTY was your answer. As the article comments, PuTTY was written with this technique.

When you've solved a problem that a lot of people had, and your solution is widely acknowledged as the best one out there, people get interested in how you think it should be solved. Which is why this article interested me when I first saw in many years ago on Slashdot.

So absolutely not a joke.

ajross · 2024-02-25T17:32:35.000000Z

> Seems to me like an iterator solves the lion's share of the problem here.

Iterator APIs are indeed aimed at the same kind of problem, but they're not the same solution. And often they're harder to write. If you have a component with a big list of stuff, it's generally easier to write and understand the idea of "iterate over my big list of stuff and emit one item at a time to my consumer" than it is "what state do I need to remember such that when I get called again I can emit the correct next item given the one I just emitted?".

Coroutines are a way of expressing the former. Iterators are the latter. If all you do is write the outer loop, iterators are absolutely just as good. If you need to write the iterator itself, it's more of a discussion.

syncurrent · 2024-02-25T17:09:16.000000Z

Proto-Activities have this context to store the state in the caller.

https://github.com/frameworklabs/proto_activities