Hacker News new | past | comments | ask | show | jobs | submit login
Why choose async/await over threads? (notgull.net)
430 points by thunderbong 7 months ago | hide | past | favorite | 458 comments



Async/await with one thread is simple and well-understood. That's the Javascript model. Threads let you get all those CPUs working on the problem, and Rust helps you manage the locking. Plus, you can have threads at different priorities, which may be necessary if you're compute-bound.

Multi-threaded async/await gets ugly. If you have serious compute-bound sections, the model tends to break down, because you're effectively blocking a thread that you share with others.

Compute-bound multi-threaded does not work as well in Rust as it should. Problems include:

- Futex congestion collapse. This tends to be a problem with some storage allocators. Many threads are hitting the same locks. In particular, growing a buffer can get very expensive in allocators where the recopying takes place with the entire storage allocator locked. I've mentioned before that Wine's library allocator, in a .DLL that's emulating a Microsoft library, is badly prone to this problem. Performance drops by two orders of magnitude with all the CPU time going into spinlocks. Microsoft's own implementation does not have this problem.

- Starvation of unfair mutexes. Both the standard Mutex and crossbeam-channel channels are unfair. If you have multiple threads locking a resource, doing something, unlocking the resource, and repeating that cycle, one thread will win repeatedly and the others will get locked out.[1] If you need fair mutexes, there's "parking-lot". But you don't get the poisoning safety on thread panic that the standard mutexes give you.

If you're not I/O bound, this gets much more complicated.

[1] https://users.rust-lang.org/t/mutex-starvation/89080


Yes, 100%.

I've mostly only dealt with IO-bound computations, but the contention issues arise there as well. What's the point of having a million coroutines when the IO throughput is bounded again? How will coroutines save me when I immediately exhaust my size 10 DB connection pool? It won't, it just makes debugging and working around the issues harder and difficult to reason about.


The debugging issue is bigger than what it seems.

Use of async/await model in particular ends up with random hanged micro tasks in some random place in code that are very hard to trace back to the cause because they're dispersed potentially anywhere.

Concurrency is also rather undefined, as are priorities most of the time.

This can be partly fixed by labelling, which adds more complexity, but at least is explicit. Then the programmer needs to know what to label... Which they won't do, and Rust has no training wheels to help with concurrency.

Threads, well you have well defined ingress and egress. Priorities are handled by the OS and to some degree fairness is usually ensured.


> What's the point of having a million coroutines when the IO throughput is bounded again?

Because if you did the same thing with a million threads, you’d have all those same problems and more, because (a) threads take up more RAM, (b) threads are more expensive to spawn and tear down, and (c) threads have more registers to save/restore when context switching than coroutines.

The real answer is that you shouldn’t have a million coroutines, and you shouldn’t have a million threads. Doing such a thing is only useful if you have other useful work you can do while you wait on I/O. This is true for web servers (which want to serve other requests, so… maybe one coroutine per active request) and UI’s (which should have a dedicated thread for the UI event loop), but for other applications there’s shockingly little need to actually do async programming in the first place. Parallelism is a well-understood problem and can be used in key places (ie. doing a lot of math that you can easily split into multiple cores), but concurrency is IMO a net negative for most applications that don’t look like a web server.


Just morning bathroom musings based on your posts (yep /g) and this got me thinking maybe the robust solution (once and for all for all languages) may require a rethink at the hardware level. The CPU bound issue comes down to systemic interrupt/resume I think; if this can be done fairly for n wip thread-of-execution with efficient queued context swaps (say maybe a cpu with n wip contexts) then the problem becomes a resource allocation issue. Your thoughts?


> a cpu with N wip contexts

That's what "hyper-threading" is. There's enough duplicated hardware that beyond 2 hyper-threads, it seems to be more effective to add another CPU. If anybody ever built a 4-hyperthread CPU, it didn't become a major product.

It's been tried a few times in the past, back when CPUs were slow relative to memory. There was a National Semiconductor microprocessor where the state of the CPU was stored in main memory, and, by changing one register, control switched to another thread. Going way back, the CDC 6600, which was said to have 10 peripheral processors for I/O, really had only one, with ten copies of the state hardware.

Today, memory is more of a bottleneck that the CPU, so this is not a win.


The UltraSPARC T1 had 4-way SMT, and its successors bumped that to 8-way. Modern GPU compute is also highly based on hardware multi-threading as a way of compensating for memory latency, while also having wide execution units that can extract fine-grained parallelism within individual threads.


Also, IBM POWER have SMT at levels above 2; at least POWER 7 had 4-way SMT ("hyperthreading").


Missed that. That's part of IBM mainframe technology, where you can have "logical partitions", a cluster on a chip, and assign various resources to each. IBM POWER10 apparently allows up to 8-way hyperthreading if configured that way.


Thanks, very informative.


What you said sounded in my head more like you’re describing a cooperatively scheduled OS rather than a novel hardware architecture.


(This has been a very low priority background thread in my head this morning so cut me some slack on hand waving.)

Historically, the H/W folks addressed (pi) memory related architectural changes, such as when multicore came around and we got level caches. Imagine if we had to deal at software level with memory coherence in different cores [down to the fundamental level of invalidating Lx bytes]. There would be NUMA like libraries and various hacks to make it happen.

Arguably you could say "all that is in principle OS responsibility even memory coherence across cores" and we're done. Or you would agree that "thank God the H/W people took care of this" and ask can they do the same for processing?

The CPU model afaik hasn't changed that much in terms of granularity of execution steps whereas the H/W people could realize that d'oh an execution granularity in conjunction with hot context switching mechanism, could really help the poor unwashed coders in efficiently executing multiple competing sequences of code (which is all they know about at H/W level).

If your CPU's architecture specs n+/-e clock ticks per context iteration, then you compile for that and you design languages around that. CPU bound now becomes heavy CPU usage but is not a disaster for any other process sharing the machine with you. It becomes a matter of provisioning instead of programming ad-hoc provisioning ..


If our implementations are bad because of preemption, then I’m not sure why the natural conclusion isn’t “maybe there should be less preemption” instead of “[even] more of the operating system should be moved into the hardware”.


If you have fewer threads ready to run than CPU cores, you never have any good reason to interrupt one of them.


I don't know how the challenges with cooperative (no-preemptive) multitasking keep needing to get rediscovered. Even golang, which I consider a very responsibly designed language, went with cooperative at first until they were forced to switch to preemptive. Not saying cooperative multitasking doesn't have its place, just that it's gotta have a warning sticker or even better disallow certain types of code from executing statically.

Also great time to plug a related post, What color is your function:

https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...


I keep having to repeat it on this unfortunate website: it's an implementation detail, a multi-threaded executor of async/await can cope with starvation perfectly well as demonstrated by .NET's implementation that can shrug off some really badly written code that interleaves blocking calls with asynchronous.

https://news.ycombinator.com/item?id=39530435

https://news.ycombinator.com/item?id=39786142

https://news.ycombinator.com/item?id=39721626


Said badly with code will still execute with poor performance and the mix can be actively hard to spot.


Badly written threaded code will have the same problem, unfortunately


You would be surprised. It ultimately regresses to "thread per request but with extra steps". I remember truly atrocious codebases that were spamming task.Result everywhere and yet there were performing tolerably even back on .NET Framework 4.6.1. The performance has (and often literally) improved ten-fold since then, with threadpool being rewritten, its hill-climbing algorithm receiving further tuning and it getting proactive blocked workers detection that can inject threads immediately without going through hill-climbing.


Also there is the extra software development and maintenance cost due to coloured functions that async/await causes

https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...

Unless you are doing high scalability software, async might not be worth of the trade offs.


If you expect to color your functions async by default, it's really easy to turn a sync functuion into a near-zero-cost async, a Future that has already been resolved at construction time by calling the sync function.

This way, JS / TS becomes pretty comfortable. Except for stacktraces, of course.


Except when your sync function calls a function that makes the async runtime deadlock. No, there's no compiler error or lint for that. Good luck!


Useful stacktraces is one of the reasons why I use Go instead of JS/TS on the server-side.


JS has proper async stacktraces as well, or do you mean something else?


You're right. But it was not the case years ago when my team evaluated Go and Node.js. I should have clarified.


> Useful stacktraces is one of the reasons why I use Go

I'm not sure if this is sarcasm or not.


Fair enough. The context is that years ago, my team was evaluating Go and Node.js as options for a server requiring concurrency, and back then Node.js didn't provide any stacktrace for async callbacks. I know it has been improved since then, but I don't use Node.js.

What do you miss in Go's stack traces?


> Async/await with one thread is simple and well-understood. That's the Javascript model.

Bit of nuance (I'm not an authority of this and I don't know the current up-to-date answer across 'all' of the Javascript runtimes these days):

Isn't it technically "the developer is exposed the concept of just one thread without having to worry about order of execution (other than callbacks/that sort of pre-emption)" but under the hood it can actually be using as many threads as it wants (and often does)? It's just "abstracted" away from the user.


I don't think so. Two threads updating the same variable would be observable in code. There's no way around it.


I'm referring to something like this: https://stackoverflow.com/questions/7018093/is-nodejs-really...

It's like a pedantic technical behind the scenes point I think, just trying to learn "what's true"


Looks like the spec refers to the thing-that-has-the-thread as an "agent". https://tc39.es/ecma262/#sec-agents

I don't know the details about how any implementation of a javascript execution environment allows for the creation of new agents.


I mean, yeah, if you go deep enough your OS may decide to schedule your browser thread to a different core as well. I don’t think it has any relevance here - semantically, it is executed on a single thread, which is very different from multi-threading.


It is executed by your runtime which may or may not behind the scenes be using a single thread for your execution and/or the underlying I/O/eventing going on underneath, no?


Most JS runtimes are multi-threaded behind the scenes. If you start a node process:

  node -e "setTimeout(()=>{}, 10_000)" &
Then wait a second and run:

  ps -o thcount $!
Or on macOS:

  ps -M $!
You'll see that there are multiple threads running, but like others have said, it's completely opaque to you as a programmer. It's basically just an implementation detail.


If you need true parallelism of course you can opt-in to Web Workers (called Worker Threads in Node), or the Node-specific child_process.fork, or cluster module.


I'm not sure what you mean by "multi-threaded async/wait"... Isn't the article considering async/await as an alternative to threads (i.e. coroutines vs threads)?

I'm a C++ programmer, and still using C++17 at work, so no coroutines, but don't futures provide a similar API? Useful for writing async code in serialized fashion that may be easier (vs threads) to think about and debug.

Of course there are still all the potential pitfalls that you enumerate, so it's no magic bullet for sure, but still a useful style of programming on occasion.


They mean async/await running over multiple OS threads compared to over one OS thread.

You can also have threads running on one OS thread (Python) or running on multiple OS threads (everything else).

Every language’s concurrency model is determined by both a concurrency interface (callbacks, promises, async await, threads, etc), and an implementation (single-threaded, multiple OS threads, multiple OS processes).


async/await tasks can be run in parallel on multiple threads, usually no more threads than there are hardware threads. This allows using the full capabilities of the machine, not just one core's worth. In a server environment with languages that support async/await but don't have the ability to execute on multiple cores like Node.js and Python, this is usually done by spawning many duplicate processes and distributing incoming connections round-robin between them.


Jemalloc can use separate arenas for different threads which I imagine mostly solves the futex congestion issue. Perhaps it introduces new ones?


IIRC glibc default malloc doesn't use per-thread arenas as they would waste too much memory on programs spawning tens of thousands of threads, and glibc can't really make too many workload assumptions. Instead I think it uses a fixed pool of arenas and tries to minimize contention.

These days on Linux, with restartable sequences, you can have true per-cpu arenas with zero contention. Not sure which allocator use them though.


> As pressure from thread collisions increases, additional arenas are created via mmap to relieve the pressure. The number of arenas is capped at eight times the number of CPUs in the system (unless the user specifies otherwise, see mallopt), which means a heavily threaded application will still see some contention, but the trade-off is that there will be less fragmentation.

https://sourceware.org/glibc/wiki/MallocInternals

So glibc's malloc will use up to 8x #CPUs arenas. If you have 10_000 threads, there is likely to be contention.


https://git.kernel.org/pub/scm/libs/librseq/librseq.git/tree...

Thank you, I didn't know about this one. An allocator that seems to use it at https://google.github.io/tcmalloc/rseq.html


Not only does tcmalloc use rseq, the feature was contributed to Linux by the tcmalloc authors, for this purpose, among other purposes.


Glibc allocator is also notoriously prone to not giving back memory that could be given back to the OS when running several threads over a long time period. Reducing pool size helps a bit, but not sufficiently to make memory stable.


I find this has worked well for me when I can easily state what thread pool work gets executed on.


I think its kotlin's model. The language don't have a default co-routine executor set, you need to provide your own or spawn one with the std. It can be thread-pooled, single thread, or custom one but there isn't a default one set.

If you use single threaded executor, then race condition won't happen. If you choose pooled, then you obviously should realize there can be a race condition. It's all about choice you made.


You focus on rust rather than generalizing...

If you are IO bound, consider threads. This is almost the same as async / await.

What was missing above, and the problem with how most compute education is these days, if you are compute bound you need to think about processes.

If you were dealing with python concurrent.futures, you would need to consider processpooexecutor vs. threadpoolexecutor.

Threadpoolexecutor gives you the same as the above.

With multiprocessor executor, you will have multiple processes executing independently but you have to copy a memory space. Which people don't consider. In python DS work - multiprocessor workloads need to determine memory space considerations.

It's kinda f'd up how JS doesn't have engineers think about their workloads.


I think you are coming at this from a particular Python mindset, driven by the limitations imposed on Python threading by the GIL. This is a peculiarity specific to Python rather than a general purpose concept about threads vs processes.


[...] if you are compute bound you need to think about processes.

How would that help? Running several processes instead of several threads will not speed anything up [1] and might actually slow you down because of additional inter-process communication overhead.

[1] Unless we are talking about running processes across multiple machines to make use of additional processors.


I think you need to clarify what you mean by "thread". For example they are different things when we compare Python and Java Threads. Or OS threads and green threads. I think the GP was relating to OS threads.


I was also referring to kernel threads. If we are talking about non-kernel threads, then sure, a given implementation might have limitations and there might be something to be gained by running several processes, but that would be a workaround for those limitations. But for kernel threads there will generally be no gain by spreading them across several processes.


Right; a process is just a thread (or set of threads) and some associated resources - like file descriptors and virtual memory allocations. As I understand it, the scheduler doesn’t really care if you’re running 1000 processes with 1 thread each or 1 process with 1000 threads.

But I suspect it’s faster to swap threads within a process than swap processes, because it avoids expensive TLB flushes. And of course, that way there’s no need for IPC.

All things being equal, you should get more performance out of a single process with a lot of threads than a lot of individual processes.


> a process is just a thread (or set of threads) and some associated resources - like file descriptors and virtual memory allocations

Or rather the reverse, in Linux terminology. Only processes exist, some just happen to share the same virtual address space.

> the scheduler doesn’t really care if you’re running 1000 processes with 1 thread each or 1 process with 1000 threads

Not just the scheduler, the whole kernel really. The concept of thread vs process is mainly a userspace detail for Linux. We arbitrarily decided that the set of clone() parameters from fork() create a process, while the set of clone() parameters through pthread_create() create a thread. If you start tweaking the clone() parameters yourself, then the two become indistinguishable.

> it’s faster to swap threads within a process than swap processes, because it avoids expensive TLB flushes

Right, though this is more of a theorical concern than a practical one. If you are sensible to a marginal TLB flush, then you may as well "isolcpu" and set affinities to avoid any context switch at all.

> that way there’s no need for IPC

If you have your processes mmap a shared memory, you effectively share address space between processes just like threads share their address space.

For most intent and purposes, really, I do find multiprocessing just better than multithreading. Both are pretty much indistinguishable, but separate processes give you the flexibility of being able to arbitrarily spawn new workers just like any other process, while with multithreading you need to bake in some form of pool manager and hope to get it right.


> The concept of thread vs process is mainly a userspace detail for Linux. We arbitrarily decided that the set of clone() parameters from fork() create a process, while the set of clone() parameters through pthread_create() create a thread. If you start tweaking the clone() parameters yourself, then the two become indistinguishable.

That's from Plan 9. There, you can fork, with various calls, sharing or not sharing code, data, stack, environment variables, and file descriptors.[1] Now that's in Linux. It leads to a model where programs are divided into connected processes with some shared memory. Android does things that way, I think.

[1] https://news.ycombinator.com/item?id=863939


Or rather the reverse, in Linux terminology. Only processes exist, some just happen to share the same virtual address space.

Threads and processes are semantically quite different in standard computer science terminology. A thread has an execution state, i.e. its set of set of processor register values. A process on the other hand is a management and isolation unit for resources like memory and handles.


> If you are IO bound, consider threads. This is almost the same as async / await.

Only in Python.

> if you are compute bound you need to think about processes.

Also only in Python.


If you're using threads then consider not using Python. Or, just consider not using Python.


Backend JS just spins up another container and/or lambda and if it's too slow and requires multiple CPUs in a single deployment, oh well, too bad.


That is of course a huge overhead, compared to how other languages solve the problem.


Backend JS does whatever the hecky you _want_ it to do.

The Cluster module has been around for a long time: https://nodejs.org/api/cluster.html

Tbf a lot of the time you're running it in a container and you allocate 1 vcpu there, only downside is maybe a little extra memory overhead. And for most lambdas I think they're suited to being single threaded (imo).


^ kid who only writes python condescending to the engineers actually solving hard problems LOL


The complaints around async/await vs threads to my mind have not been that one is more or less complex than the other. It is that it bifurcates the ecosystem and one of them ends up being a second class citizen causing friction when you choose the wrong one for your project.

While you can mix and match them it's hacky and inefficient when you need to. As it stands now the Rust ecosystem has decided that if you want to do anything involving IO you are stuck with an all async/await ecosystem. Since nearly everything you might want to do in Rust probably involves IO with very few exceptions that means for the most part you should probably ignore non-async libraries regardless of whether the rest of your application wants it to be async or not.

There is a hypothetical world where Rust used abstractions that are even more composable than async/await, whose composability really wants everything else to be async/await too. If that had happened then I think most of the complaints would have disappeared.


I agree with your diagnosis. It's what I concluded in my own Rust async blog post[0](which surely are mandatory now). It's even worse than bifurcating the ecosystem because even within async code it's almost always closely tied to the executor, usually Tokio. I talk about this as an extension to function colouring, adopting without.boats's three colour proposition with blue(non-IO), green(blocking-IO), and red(async-IO). In the extended model it's really blue, green, red(Tokio), purple(async-std), and orange(smol) etc.

I find that the sans-IO pattern is the best solution to this problem. Under this pattern you isolate all blue code and use inversion of control for I/O and time. This way you end up with the core protocol logic being unaware of IO and it becomes simple to wrap it in various forms of IO.

0: https://hugotunius.se/2024/03/08/on-async-rust.html


I love the fact that people outside the Python ecosystem are spreading the word about sans-IO. I think it should be the next iteration in coding using futures-based concurrency. I only wish it were more popular in the Python land as well.


The Haskell folks got there first.


Is that sans-IO pattern doing anything particularly novel, or is that basically just a subset of what any functional programmer does anyway?

Don't get me wrong, I'm fully on board with isolating IO. But why not go the slight extra step and just make it completely pure? You've already done the hard part of purity. The rest is easy.

Then you get all those nice benefits of being generic over async and sync, but can also memoize and parallelize freely, and all the other benefits of purity.


It's been a while since my Haskell days, but I think the key difference is whether you are abstracting over the IO or if the IO sits outside the pure code entirely When you abstract over IO you have blue(pure code) that contains generic red, green, purple, or orange code. With sans-IO you inverted this so the non-blue code is driving things forward by calling into blue code.

Rust, in particular, does not support abstracting over syncness at the moment, although there's work happening there. Even if you add support for that you also then need to abstract over the executor in use. My fear is that this will be too leaky to be useful, but we'll see. For now sans-IO is the best option in Rust.


This is a cool pattern, thanks for the share.


No problem, here are some examples of it:

* Quinn(A QUIC implementation, in particular `quinn-proto` is sans-IO, whereas the outer crate, `quinn`, is Tokio-based)[0)

* str0m(A WebRTC implementation that I work on, it's an alternative to `webrtc-rs`. We don't have any IO-aware wrappers, the user is expected to provide that themselves atm)[1]

0: https://github.com/quinn-rs/quinn

1: https://github.com/algesten/str0m/


> Since nearly everything you might want to do in Rust probably involves IO with very few exceptions that means for the most part you should probably ignore non-async libraries regardless of whether the rest of your application wants it to be async or not.

Only if you have two libraries to choose from and they are otherwise identical, which is rare. Using blocking code in async applications is not as seamless as it should be but not hard. Instead of writing `foo()` you write `tokio::spawn_blocking(foo).await`. It will run the new code in a separate thread and return a future that will resolve once that thread is done.


That assumes you are using Tokio. As another poster said not only does the ecosystem fragment along the async/non async lines but along the runtime lines. Async is an extremely leaky abstraction. You are in a way making my point for me. If you want to avoid painful refactoring you should basically always start out tokio-async and shim in non async code as needed because going the other way is going to hurt.


`spawn_blocking` is a single function and it is not complicated, it shouldn't be too hard for other runtimes to do the same.

The end application does have to choose a runtime anyway and will have to stick with it because this area isn't standardized yet. This problem mostly affects the part of the ecosystem that wants to put complicated concurrency logic into libraries.


`spawn_blocking` should be part of a core executor interface that all executors must provide.


And of course most libraries don't even need IO because the application can do it for them, so it only makes sense for them to be async if they're computationally heavy enough to cause problems for the runtime.


>As it stands now the Rust ecosystem has decided that if you want to do anything involving IO you are stuck with an all async/await ecosystem.

I mean C# works basically the same way, even through there are non-async options for IO using the async options basically forces you to be async all the way back to Main(). There are ways to safely call async methods from sync methods but they make debugging infinitely harder.


Well, yes. That doesn't mean it's not annoying though. It happens in every language that provides syntactic support for the distinction between async/await and non async. It's, I think, core to the syntactic and semantic abstractions that were popularized by Javascript.


There are a lot of moments not covered. For example:

- async/await runs in context of one thread, so there is no need for locks or synchronization. Unless one runs async/await in multiple threads to actually utilize CPU cores, then locks and synchronization are necessary again. This complexity may be hidden in some external code. For example instead of synchronizing access to a single database connection it is much easier to open one database connection per async task. However such approach may affect performance, especially with sqlite and postgres.

- error propagation in async/await is not obvious. Especially when one tries to group up async tasks. Happy eyeballs are a classic example.

- since network I/O was mentioned, backpressure should also be mentioned. CPython implementation of async/await notoriously lacks network backpressure causing some problems.


I have lots of issues with async/await, but this is my primary beef with async/await:

Remember the Gang of Four book "Design Patterns"? It was basically a cookbook on how to work around the deficiencies of (mostly) C++. Yet everybody applied those patterns inside languages that didn't have those deficiencies.

Rust can run multiple threads just fine--it's not Javascript. As such, it didn't have to use async/await. It could have tried any of a bunch of different solutions. Rust is a systems language, after all.

However, async/await was necessary in order to shove Rust down the throats of the Javascript programmers who didn't know anything else. Quoting without.boats:

https://without.boats/blog/why-async-rust/

> I drove at async/await with the diligent fervor of the assumption that Rust’s survival depended on this feature.

Whether async/await was even a good fit for Rust technically was of no consequence. Javascript programmers were used to async/await so Rust was going to have async/await so Rust could be jammed down the throats of the Javascript network services programmers--technical consequences be damned.


Async/await was invented for C#, another multithreaded language. It was not designed to work around a lack of true parallelism. It is instead designed to make it easier to interact with async IO without having to resort to manually managed thread pools. It basically codifies at the language level a very common pattern for writing concurrent code.

It is true though that async/await has a significant advantage compared to fibers that is related to single threaded code: it makes it very easy to add good concurrency support on a single thread, especially in languages which support both. In C#, it was particularly useful for executing concurrent operations from the single GUI thread of WPF or WinForms, or from parts of the app which interact with COM. This used the SingleThreadedExecutor, which schedules tasks on the current thread, so it's safe to run GUI updates or COM interactions from a Task, while also using any other async/await code, since tasks inherit their executor.


Yeah, Microsoft looked at callback hell, realized that they had seen this one before, dipped into the design docs for F# and lifted out the syntactic sugar of monads. And it worked fine. But really, async/await is literally callbacks. The keyword await just wraps the rest of the function in a lambda and stuffs it in a callback. It's fully just syntactic sugar. It's a great way of simplifying how callback hell is written, but it's still callback hell in the end. Where having everything run in callbacks makes sense, it makes sense. Where it doesn't it doesn't. At some point you will start using threads, because your use case calls for threads instead of callbacks.


Most compilers don't just wrap the rest of the function into a lambda but build a finite state machine with each await point being a state transition. It's a little bit more than just "syntactic sugar" for "callbacks". In most compilers it is most directly like the "generator" approach to building iterators (*function/yield is ancient async/await).

I think the iterator pattern in general is a really useful reference to keep in mind. Of course async/await doesn't replace threads just like iterators don't replace lists/arrays. There are some algorithms you can more efficiently write as iterators rather than sequences of lists/arrays. There are some algorithms you can more efficiently write as direct list/array manipulation and avoid the overhead of starting iterator finite state machines. Iterator methods are generally deeply composable and direct list/array manipulation requires a lot more coordination to compose. All of those things work together to build the whole data pipeline you need for your app. So too, async/await makes it really easy to write some algorithms in a complex concurrent environment. That async/await runs in threads and runs with threads. It doesn't eliminate all thinking about threads. async/await is generally deeply composable and direct thread manipulation needs more work to coordinate. In large systems you probably still need to think about both how you are composing your async/await "pipelines" and also how your threads are coordinated. The benefits of composition such as race/await-all/schedulers/and more are generally worth the extra complexity and overhead (mental and computation space/time), which is why the pattern has become so common so quickly. Just like you can win big with nicely composed stacks of iterator functions. (Or RegEx or Observables or any of the other many cases where designing complex state machines both complicates how the system works and eventually simplifies developer experience with added composability.)


Eh, that's true, and that's a convenient way of doing intermediate representation, since its very machine-friendly. But really, finite state machines are just callbacks, just as generators can be treated as just callbacks. There is no real logical difference, and it is their historical origin, even for generators which is just a neat syntax for what could have been done back in the day with a more explicit OO solution.

It does provide a more conceptual way of thinking about what those old callbacks would have meant though, which opens up thinking about scheduling them. Still, it's not something I'd rather do, if I need an asynchronous iterator I'll write one but if I need to start scheduling tasks then I'm using threads and leaving it to someone smarter than me.


I generally don't agree with the direction withoutboats went with asynchricity but you are reading in a whole lot more into that sentence than is really there. It is very clear (based on his writing, in this and other articles) that he went with the solution because he thinks it is the right one, on a technical level.

I don't agree, but making it sound like it was about marketing the language to JavaScript people is just wrong.


> was about marketing the language to JavaScript people is just wrong.

No it seems very right to me. Rust despite being "Systems language" was not satisfied with market size of systems programing and they really needed all those millions of JS programmers to make language a big success.


This is a lie. Async/await was developed to support systems that need to use non-blocking IO for performance reasons, not to appeal to JS programmers.


Threads have a cost. Context switching between them at the kernel level has a cost. There are some workloads that gain performance by multiplexing requests on a thread. Java virtual threads, golang goroutines, and dotnet async/await (which is multi threaded like Rust+tokio) all moved this way for _performance_ reasons not for ergonomic or political ones.

It's also worth pointing out that async/await was not originally a JavaScript thing. It's in many languages now but was first introduced in C#. So by your logic Rust introduced it so it could be "jammed down the throats" of all the dotnet devs..


> all moved this way for _performance_ reasons

They did NOT.

Async performance is quite often (I would even go so far as to say "generally") worse than single threaded performance in both latency AND throughput under most loads that programmers ever see.

Most of the complications of async are much like C#:

1) Async allows a more ergonomic way to deal with a prima donna GUI that must be the main thread and that you must not block. This has nothing to do with "performance"--it is a limitation of the GUI toolkit/Javascript VM/etc..

2) Async adds unavoidable latency overhead and everybody hits this issue.

3) Async nominally allows throughput scaling. Most programmers never gain enough throughput to offset the lost latency performance.


1) it offers a more ergonomic way for concurrency in general. `await Task.WhenAll(tasks);` is (in my opinion) more ergonomic than spinning up a thread pool in any language that supports both.

2) yes, there is a small performance overhead for continuations. Everything is a tradeoff. Nobody is advocating for using async/await for HFT, or in low level languages like C or Zig. We're talking nanoseconds here.. for a typical web API request that's in the 10's of ms that's a drop in the ocean.

3) I wouldn't say it's nominal! I'd argue most non-trivial web workloads would benefit from this increase in throughput. Pre-fork webservers like gunicorn can consume considerably more resources to serve the same traffic than an async stack such as uvicorn+FastAPI (to use Python as an example).

> Most of the complications of async are much like C#

Not sure where you're going with this analogy but as someone who's written back-end web services in basically every language (other than lisp, no hate though), C#/dotnet core is a pretty great stack. If you haven't tried it in a while you should give it a shot.


Eh. Async and to a lesser extent green threads are the only solutions to slowloris HTTP attacks. I suppose your other option is to use a thread pool in your server - but then you need to but hide your web server behind nginx to keep it safe. (And nginx is safe because it internally uses async IO).

Async is also usually wildly faster for networked services than blocking IO + thread pools. Look at some of the winners of the techempower benchmarks. All of the top results use some form of non blocking IO. (Though a few honourable mentions use go - with presumably a green thread per request):

https://www.techempower.com/benchmarks/

I’ve also never seen Python or Ruby get anywhere near the performance of nodejs (or C#) as a web server. A lot of the difference is probably how well tuned v8 and .net are, but I’m sure the async-everywhere nature of javascript makes a huge difference.


Async's perfect use case is proxies though- get a request, go through a small decision tree, dispatch the I/O to the kernel. You don't want proxies doing complex logic or computation, the stuff that creates bottlenecks in the cooperative multithreading.


Most API's (rest, graphql or otherwise) are effectively a proxy. Like you say, if you don't have complex logic and you're effectively mapping an HTTP request to a query, then your API code is just juggling incoming and outgoing responses and this evented/cooperative approach is very effective.


Where does the unavoidable latency overhead come from?

Do you have some benchmarks available?


The comment you are responding to is not wrong about higher async overhead, but it is wrong at everything else either out of lack of experience with the language or out of being confused about what it is that Task<T> and ValueTask<T> solve.

All asynchronous methods (as in, the ones that have async keyword prefixed to them) are turned into state machines, where to live across await, the method's variables that persist across it need to be lifted to a state machine struct, which is then often (but not always) needs to be boxed aka heap allocated. All this makes the cost of what would have otherwise been just a couple of method calls way more significant - single await like this can cost 50ns vs 2ns spent on calling methods.

There is also a matter of heap allocations for state machine boxes - C# is generally good when it comes to avoiding them for (value)tasks that complete synchronously and for hot async paths that complete asynchronously through pooling them, but badly written code can incur unwanted overhead by spamming async methods with await points where it could have been just forwarding a task instead. Years of bad practices arisen from low skill enterprise dev fields do not help this either, with only the switch to OSS and more recent culture shift aided by better out of box analyzers somewhat turning the tide.

This, however, does not stop C#'s task system from being extremely useful for achieving lowest ceremony concurrency across all programming languages (yes, it is less effort than whatever Go or Elixir zealots would have you believe) where you can interleave, compose and aggregate task-returning methods to trivially parallelize/fork/join parts of existing logic leading to massive code productivity improvement. Want to fire off request and do something else? Call .GetStringAsync but don't await it and go back to it later with await when you do need the result - the request will be likely done by then. Instant parallelism.

With that said, Rust's approach to futures and async is a bit different, where-as C#'s each async method is its own task, in Rust the entire call graph is a single task with many nested futures where the size of the sum of all stack frames is known statically hence you can't perform recursive calls within async there - you can only create a new (usually heap-allocated) which gives you what effectively looks a linked list of task nodes as there is no infinite recursion in calculating their sizes. This generally has lower overhead and works extremely well even in no-std no-alloc scenarios where cooperative multi-tasking is realized through a single bare metal executor, which is a massive user experience upgrade in embedded land. .NET OTOH is working on its own project to massively reduce async overhead but once the finished experiment sees integration in dotnet/runtime itself, you can expect more posts on this orange site about it.


> .NET OTOH is working on its own project to massively reduce async overhead

Where can I read more about that?


Initial experiment issue: https://github.com/dotnet/runtime/issues/94620

Experiment results write-up: https://github.com/dotnet/runtimelab/blob/e69dda51c7d796b812...

TLDR: The green threads experiment was a failure as it found (expected and obvious) issues that the Java applications are now getting to enjoy, joining their Go colleagues, while also requiring breaking changes and offering few advantages over existing model. It, however, gave inspiration to subsequent re-examination of current async/await implementation and whether it can be improved by moving state machine generation and execution away from IL completely to runtime. It was a massive success as evidenced by preliminary overhead estimations in the results.


The tl;dr that I got when I read these a few months ago was that C# relies on too much FFI which makes implementing green threads hard and on top of that would require a huge effort to rewrite a lot of stuff to fit the green thread model. Java and Go don’t have these challenges since Go shipped with a huge standard library and Java’s ecosystem is all written in Java since it never had good ffi until recently.


Surely you're not claiming that .NET's standard library is not extensive and not written in C#.

If you do, consider giving .NET a try and reading the linked content if you're interested - it might sway your opinion towards more positive outlook :)


> Surely you're not claiming that .NET's standard library is not extensive and not written in C#.

I’m claiming that MSFT seems to care really about P/Invoke and FFI performance and it was one of the leading reasons for them not to choose green threads. So there has to be something in .NET or C# or win forms or whatever that is influencing the decision.

I’m also claiming that this isn’t a concern for Java. 99.9% of the time you don’t go over FFI and it’s what lead the OpenJdk team to choose virtual threads.

> If you do, consider giving .NET a try

I’d love to, but dealing with async/await is a pain :)


You’ve never used it, so how can you know?


How do you know I've never used it? Do you have a crystal ball?


> So by your logic Rust introduced it so it could be "jammed down the throats" of all the dotnet devs..

You're missing his point. His point is that the most popular language, which has the most number of programmers forced the hand of Rust devs.

His point is not that the first language had this feature, it's that the most programmers used this feature, and that was due to the most popular programming language having this feature.


That Rust needed async/await to be palatable to JS devs would only be a problem if we think async/await is not needed in Rust, because it is only useful to work around limitations of JS (single-threaded execution, in this case). If instead async/await is a good feature in its own right (even if not critical), then JS forcing Rust's hand would be at best an annoyance.

And the idea that async/await was only added to JS to work around its limitations is simply wrong. So the OP is overall wrong: async/await is not an example of someone taking something that only makes sense in one language and using it another language for familiarity.


> So the OP is overall wrong: async/await is not an example of someone taking something that only makes sense in one language and using it another language for familiarity.

I don't really understand the counter argument here.

My reading of the argument[1] is that "Popularity amongst developers forced Rust devs hands in adding async". If this is the argument, then a counter argument of "It never (or only) made sense in the popular language (either)" is a non-sequitor.

IOW, if it wasn't added due to technical reasons (which is the original argument, IIRC), then explaining technical reasons for/against isn't a counter argument.

[1] i.e. Maybe I am reading it wrong?


You are not reading the claim wrong, but the claim is a lie. We did not add async/await to Rust because it was popular but because it was the right solution for Rust. If you actually read my post that this liar linked to, you will find a detailed explanation of the technical history behind the decision.


You are not reading it wrong, and your statements are accurate.

My broader point is that the possibility of there being a "technically better" construct was simply not in scope for Rust. In order for Rust to capture Javascript programmers, async/await was the only construct that could possibly be considered.

And, to be fair, it worked. Rust's growth has been almost completely on the back of network services programming.


This comment is a lie.


That is his claim, but he is lying.


I would damn this, if Async/Await wasn't a good enough (TM) solution for certain problems where Threads are NOT good enough.

Remember: there is a reason why Async/Await was created B E F O R E JavaScript was used for more than sprinkling a few fancy effects on some otherwise static webpages


Strong disagree.

> Rust can run multiple threads just fine--it's not Javascript. As such, it didn't have to use async/await. It could have tried any of a bunch of different solutions. Rust is a systems language, after all.

it allows you to have semantic concurrency where there are no threads available. like, you known, on microncontrollers without an (RT)OS where such a systems programming language is a godsend.

seriously, using async/await on embedded makes so much sense.


> Rust can run multiple threads just fine

Rust is also used in environments which don't support threads. Embedded, bare metal, etc.


async/await is just a different concurrency paradigm with different strengths and weaknesses than threads. Rust has support for threaded concurrency as well though the ecosystem for it is a lot less mature.


Every word you've written is false, slanderous and idiotic. You are quoting a post in which I explain at length why async/await was the right fit for Rust technically. You are either illiterate or malignant.

Despite your evident ignorance, there are many network services that are not written in JavaScript. In fact, there are many that are written in C or C++. This is the addressable market of async Rust. Appealing to JavaScript users was not in any way a motivating factor for the development of async/await in Rust. Not at all!


Threads are much much slower than async/await.


Async/await just like threads is a concurrency mechanism and also always requires locks when accessing the shared memory. Where does your statement come from?


If you perform single threaded async in Rust, you can drop down to the cheap single threaded RefCell rather than the expensive multithreaded Mutex/RwLock


That's one example of a lock you might eliminate, but there are plenty of other cases where it's impossible to eliminate locks even while single threaded.

Consider, for example, something like this (not real rust, I'm rusty there)

    lock {
      a = foo();
      b = io(a).await;
      c = bar(b);
    }
Eliminating this lock is unsafe because a, b, and c are expected to be updated in tandem. If you remove the lock, then by the time you reach c, a and b may have changed under your feet in an unexpected way because of that await.


Yeah but this problem goes away entirely if you just don’t await within a critical region like that.

I’ve been using nodejs for a decade or so now. Nodejs can also suffer from exactly this problem. In all that time, I think I’ve only reached for a JS locking primitive once.


There is no problem here with the critical region. The problem would be removing the critical region because "there's just one thread".

This is incorrect code

      a = foo();
      b = io(a).await;
      c = bar(b);
Without the lock, `a` can mutate before `b` is done executing which can mess with whether or not `c` is correct. The problem is if you have 2 independent variables that need to be updated in tandem.

Where this might show up. Imagine you have 2 elements on the screen, a span which indicates the contents and a div with the contents.

If your code looks like this

    mySpan.innerText = "Loading ${foo}";
    myDiv.innerText = load(foo).await;
    mySpan.innerText = "";
You now have incorrect code if 2 concurrent loads happen. It could be the original foo, it could be a second foo. There's no way to correctly determine what the content of `myDiv` is from an end user perspective as it depends entirely on what finished last and when. You don't even know if loading is still happening.


I absolutely agree that that code looks buggy. Of course it is - if you just blindly mix view and model logic like that, you’re going to have a bad day. How many different states can the system be in? If multiple concurrent loads can be in progress at the same time, the answer is lots.

But personally I wouldn’t solve it with a lock. I’d solve it by making the state machine more explicit and giving it a little bit of distance from the view logic. If you don’t want multiple loads to happen at once, add an is_loading variable or something to track the loading state. When in the loading state, ignore subsequent load operations.


> add an is_loading variable or something to track the loading state.

Which is definitionally a mutex AKA a lock. However, it's not a lock you are blocking on but rather one that you are trying and leaving.

I know it doesn't look like a traditional lock, but in a language like javascript or python it's a valid locking mechanism. For javascript that's because of the single thread execution model a boolean variable is guaranteed to be consistently set for multiple concurrent actions.

That is to say, you are thinking about concurrency issues, you just aren't thinking about them in concurrency terms.

Here's the Java equivalent to that concept

https://docs.oracle.com/javase/8/docs/api/java/util/concurre...


Yeah I agree. The one time I wrote a lock in javascript it worked like you were hinting at. You could await() the lock's release, and if multiple bits of code were all waiting for the lock, they would acquire it in turn.

But again, I really think in UI code it makes a lot more sense to be clear about what the state is, model it explicitly and make the view a "pure" expression of that state. In the code above:

- The state is 0 or more promises loading data.

- The state is implicit. Ie, the code doesn't list the set of loading promises which are being awaited at any point in time. Its not obvious that there is a collection going on.

- The state is probably wrong. The developer probably wants either 0 or 1 loading states. (Or maybe a queue of them). Because the state hasn't been modelled explicitly, it probably hasn't been considered enough

- The view is updated incorrectly based on the state. If 2 loads happen at the same time, then 1 finishes, the UI removes the "loading..." indicator from the UI. Correct view logic should ensure that the UI is deterministic based on the internal state. 1 in-progress load should result in the UI saying "loading...".

Its a great example. With code like this I think you should always carefully and explicitly consider all of the states of your system, and how the state should change based on user action. Then all UI code can flow naturally from that.

A lock might be a good tool. But without thinking about how you want the program to behave, we have no way to tell. And once you know how you want your program to behave, I find locks to be usually unnecessary.


I think a lot of this type of problem goes away with immutable data and being more careful with side effects (for example, firing them all at once at the end rather than dispersed through the calculation)


> Where does your statement come from?

This is how async/await works in Node (which is single-threaded) so most developers think this is how it works in every technology.


Even in Node, if you perform asynchronous operations on a shared resource, you need synchronization mechanisms to prevent interleaving of async functions.

There has been more than one occasion when I "fixed" a system in NodeJS just by wrapping some complex async function up in a mutex.


This lacks quite a bit of nuance. In node you are guaranteed that synchronous code between two awaits will run to completion before another task(that could access your state) from the event loop gets a turn; with multi-threaded concurrency you could be preempted between any two machine instructions. So while you _do_ have to serialize access to shared IO resources, you do _not_ have to serialize access to memory(just add the connection to the hashset, no locks).

What you usually see with JS for concurrency of shared IO resources in practice is that they are "owned" by the closure of a flow of async execution and rarely available to other flows. This architecture often obviates the need to lock on the shared resource at all as the natural serialization orchestrated by the string of state machines already naturally accomplishes this. This pattern was even quite common in the CPS style before async/await.

For example, one of the first things an app needs do before talking to a DB is to get a connection which is often retrieved by pulling from a pool; acquiring the reservation requires no lock, and by virtue of the connection being exclusively closed over in the async query code, it also needs no locking. When the query is done, the connection can be replaced to the pool sans locking.

The place where I found synchronization most useful was in acquiring resources that are unavailable. Interestingly, an async flow waiting on a signal for a shared resource resembles a channel in golang in how it shifts the state and execution to the other flow when a pooled resource is available.

All this to say, yeah I'm one of the huge fans of node that finds rust's take on default concurrency painfully over complicated. I really wish there was an event-loop async/await that was able to eschew most of the sync, send, lifetime insanity. While I am very comfortable with locks-required multithreaded concurrency as well, I honestly find little use for it and would much prefer to scale by process than thread to preserve the simplicity of single-threaded IO-bound concurrency.


> So while you _do_ have to serialize access to shared IO resources, you do _not_ have to serialize access to memory(just add the connection to the hashset, no locks).

No, this can still be required. Nothing stops a developer setting up a partially completed data structure and then suspending in the middle, allowing arbitrary re-entrancy that will then see the half-finished change exposed in the heap.

This sort of bug is especially nasty exactly because developers often think it can't happen and don't plan ahead for it. Then one day someone comes along and decides they need to do an async call in the middle of code that was previously entirely synchronous, adds it and suddenly you've lost data integrity guarantees without realizing it. Race conditions appear and devs don't understand it because they've been taught that it can't happen if you don't have threads!


> So while you _do_ have to serialize access to shared IO resources, you do _not_ have to serialize access to memory

Yes, in Node you don't get the usual data races like in C++, but data-structure races can be just as dangerous. E.g. modifying the same array/object from two interleaved async functions was a common source of bugs in the systems I've referred to.

Of course, you can always rely on your code being synchronous and thus not needing a lock, but if you're doing anything asynchronous and you want a guarantee that your data will not be mutated from another async function, you need a lock, just like in ordinary threads.

One thing I deeply dislike about Node is how it convinces programmers that async/await is special, different from threading, and doesn't need any synchronisation mechanisms because of some Node-specific implementation details. This is fundamentally wrong and teaches wrong practices when it comes to concurrency.


But single-threaded async/await _is_ special and different from multi-threaded concurrency. Placing it in the same basket and prescribing the same method of use is fundamentally wrong and fails to teach the magic of idiomatic lock free async javascript.

I'm honestly having a difficult time creating a steel man js sample that exhibits data races unless I write weird C-like constructs and ignore closures and async flows to pass and mutate multi-element variables by reference deep into the call stack. This just isn't how js is written.

When you think about async/await in terms of shepherding data flows it becomes pretty easy to do lock free async/await with guaranteed serialization sans locks.


> I'm honestly having a difficult time creating a steel man js sample that exhibits data races

I can give you a real-life example I've encountered:

    const CACHE_EXPIRY = 1000; // Cache expiry time in milliseconds

    let cache = {}; // Shared cache object

    function getFromCache(key) {
      const cachedData = cache[key];
      if (cachedData && Date.now() - cachedData.timestamp < CACHE_EXPIRY) {
        return cachedData.data;
      }
      return null; // Cache entry expired or not found
    }

    function updateCache(key, data) {
      cache[key] = {
        data,
        timestamp: Date.now(),
      };
    }

    var mockFetchCount = 0;

    // simulate web request shorter than cache time
    async function mockFetch(url) {
      await new Promise(resolve => setTimeout(resolve, 100));
      mockFetchCount += 1;
      return `result from ${url}`;
    }

    async function fetchDataAndUpdateCache(key) {
      const cachedData = getFromCache(key);
      if (cachedData) {
        return cachedData;
      }

      // Simulate fetching data from an external source
      const newData = await mockFetch(`https://example.com/data/${key}`); // Placeholder fetch

      updateCache(key, newData);
      return newData;
    }

    // Race condition:
    (async () => {
      const key = 'myData';

      // Fetch data twice in a sequence - OK
      await fetchDataAndUpdateCache(key);
      await fetchDataAndUpdateCache(key);
      console.log('mockFetchCount should be 1:', mockFetchCount);

      // Reset counter and wait cache expiry
      mockFetchCount = 0;
      await new Promise(resolve => setTimeout(resolve, CACHE_EXPIRY));

      // Fetch data twice concurrently - we executed fetch twice!
      await Promise.all([fetchDataAndUpdateCache(key), fetchDataAndUpdateCache(key)]);
      console.log('mockFetchCount should be 1:', mockFetchCount);
    })();

This is what happens when you convince programmers that concurrency is not a problem in JavaScript. Even though this cache works for sequential fetching and will pass trivial testing, as soon as you have concurrent fetching, the program will execute multiple fetches in parallel. If server implements some rate-limiting, or is simply not capable of handling too many parallel connections, you're going to have a really bad time.

Now, out of curiosity, how would you implement this kind of cache in idiomatic, lock-free javascript?


> how would you implement this kind of cache in idiomatic, lock-free javascript?

The simplest way is to cache the Promise<data> instead of waiting until you have the data:

    -async function fetchDataAndUpdateCache(key: string) {
    +function fetchDataAndUpdateCache(key: string) {
       const cachedData = getFromCache(key);
       if (cachedData) {
         return cachedData;
       }

       // Simulate fetching data from an external source
     -const newData = await mockFetch(`https://example.com/data/${key}`); // Placeholder fetch
     +const newData = mockFetch(`https://example.com/data/${key}`); // Placeholder fetch

       updateCache(key, newData);
       return newData;
     }
From this the correct behavior flows naturally; the API of fetchDataAndUpdateCache() is exactly the same (it still returns a Promise<result>), but it’s not itself async so you can tell at a glance that its internal operation is atomic. (This does mildly change the behavior in that the expiry is now from the start of the request instead of the end; if this is critical to you you can put some code in `updateCache()` like `data.then(() => cache[key].timestamp = Date.now()).catch(() => delete cache[key])` or whatever the exact behavior you want is.)

I‘m not even sure what it would mean to “add a lock” to this code; I guess you could add another map of promises that you’ll resolve when the data is fetched and await on those before updating the cache, but unless you’re really exposing the guts of the cache to your callers that’d achieve exactly the same effect but with a lot more code.


Ok, that's pretty neat. Using Promises themselves in the cache instead of values to share the source of data itself.

While that approach has a limitation that you cannot read the data from inside the fetchDataAndUpdateCache (e.g. to perform caching by some property of the data), that goes beyond the scope of my example.

> I‘m not even sure what it would mean to “add a lock” to this code

It means the same as in any other language, just with a different implementation:

    class Mutex {
        locked = false
        next = []

        async lock() {
            if (this.locked) {
                await new Promise(resolve => this.next.push(resolve));
            } else {
                this.locked = true;
            }
        }

        unlock() {
            if (this.next.length > 0) {
                this.next.shift()();
            } else {
                this.locked = false;
            }
        }
    }
I'd have a separate map of keys-to-locks that I'd use to lock the whole fetchDataAndUpdateCache function on each particular key.


Don't forget to fung futures that are fungible for the same key.

ETA: I appreciate the time you took to make the example, also I changed the extension to `mjs` so the async IIFE isn't needed.

  const CACHE_EXPIRY = 1000; // Cache expiry time in milliseconds
  
  let cache = {}; // Shared cache object
  let futurecache = {}; // Shared cache of future values
  
  function getFromCache(key) {
    const cachedData = cache[key];
    if (cachedData && Date.now() - cachedData.timestamp < CACHE_EXPIRY) {
      return cachedData.data;
    }
    return null; // Cache entry expired or not found
  }
  
  function updateCache(key, data) {
    cache[key] = {
      data,
      timestamp: Date.now(),
    };
  }
  
  var mockFetchCount = 0;
  
  // simulate web request shorter than cache time
  async function mockFetch(url) {
    await new Promise(resolve => setTimeout(resolve, 100));
    mockFetchCount += 1;
    return `result from ${url}`;
  }
  
  async function fetchDataAndUpdateCache(key) {
    // maybe its value is cached already
    const cachedData = getFromCache(key);
    if (cachedData) {
      return cachedData;
    }
  
    // maybe its value is already being fetched
    const future = futurecache[key];
    if(future) {
      return future;
    }
  
    // Simulate fetching data from an external source
    const futureData = mockFetch(`https://example.com/data/${key}`); // Placeholder fetch
    futurecache[key] = futureData;
  
    const newData = await futureData;
    delete futurecache[key];
  
    updateCache(key, newData);
    return newData;
  }
  
  const key = 'myData';
  
  // Fetch data twice in a sequence - OK
  await fetchDataAndUpdateCache(key);
  await fetchDataAndUpdateCache(key);
  console.log('mockFetchCount should be 1:', mockFetchCount);
  
  // Reset counter and wait cache expiry
  mockFetchCount = 0;
  await new Promise(resolve => setTimeout(resolve, CACHE_EXPIRY));
  
  // Fetch data twice concurrently - we executed fetch twice!
  await Promise.all([...Array(100)].map(() => fetchDataAndUpdateCache(key)));
  console.log('mockFetchCount should be 1:', mockFetchCount);


I see, this piece of code seems to be crucial:

    // maybe its value is already being fetched
    const future = futurecache[key];
    if(future) {
      return future;
    }
It indeed fixes the problem in a JS lock-free way.

Note that, as wolfgang42 has shown in a sibling comment, the original cache map isn't necessary if you're using a future map, since the futures already contain the result:

    async function fetchDataAndUpdateCache(key) {
        // maybe its value is cached already
        const cachedData = getFromCache(key);
        if (cachedData) {
          return cachedData;
        }

        // Simulate fetching data from an external source
        const newDataFuture = mockFetch(`https://example.com/data/${key}`); // Placeholder fetch

        updateCache(key, newDataFuture);
        return newDataFuture;
    }
---

But note that this kind of problem is much easier to fix than to actually diagnose.

My hypothesis is that the lax attutide of Node programmers towards concurrency is what causes subtle bugs like these to happen in the first place.

Python, for example, also has single-threaded async concurrency like Node, but unlike Node it also has all the standard synchronization primitives also implemented in asyncio: https://docs.python.org/3/library/asyncio-sync.html


Wolfgang's optimization is very nice, I also found interesting his signal of a non-async function that returns a promise as an "atomic". I don't particularly like typed JS, so it would be less visible to me.

Absolutely agree on the observability of such things. One area I think shows some promise, though the tooling lags a bit, is in async context[0] flow analysis.

One area I have actually used it so far is in tracking down code that is starving the event loop with too much sync work, but I think some visualization/diagnostics around this data would be awesome.

If we view Promises/Futures as just ends of a string of a continued computation, whos resumption is gated by some piece of information, the points between where you can weave these ends together is where the async context tracking happens and lets you follow a whole "thread" of state machines that make up the flow.

Thinking of it this way, I think, also makes it more obvious how data between these flows is partitioned in a way that it can be manipulated without locking.

As for the node dev's lax attitude, I would probably be more agressive and say it's an overall lack of formal knowledge on how computing and data flow works. As an SE in DevOps a lot of my job is to make software work for people that don't know how computers, let alone platforms, work.

[0]: https://nodejs.org/api/async_context.html


async can be scarier for locks since a block of code might depend on having exclusive access, and since there wasn't an await, it got it. Once you add an await in the middle, the code breaks. Threading at least makes you codify what actually needs exclusive access.

async also signs you up for managing your own thread scheduling. If you have a lot of IO and short CPU-bound code, this can be OK. If you have (or occasionally have) CPU-bound code, you'll find yourself playing scheduler.


Yeah once your app gets to be sufficiently complex you will find yourself needing mutexes after all. Async/await makes the easy parts of concurrency easy but the hard parts are still hard.


> backpressure should also be mentioned

I ran into this when I joined a team using nodejs. Misc services would just ABEND. Coming from Java, I was surprised by this oversight. It was tough explaining my fix to the team. (They had other great skills, which I didn't have.)

> error propagation in async/await is not obvious

I'll never use async/await by choice. Solo project, ...maybe. But working with others, using libraries, trying to get everyone on the same page? No way.

--

I haven't used (language level) structured concurrency in anger yet, but I'm placing my bets on Java's Loom Project. Best as I can tell, it'll moot the debate.


> async/await runs in context of one thread,

Not in Rust.


There is a single thread executor crate you can use for that case if it’s what you desire, FWIW.


Yes of course, but the async/await semantics are not designed only to be single threaded. Typically promises can be resumed on any executor thread, and the language is designed to reflect that.


This is completely wrong. You gotta learn about Send and Sync in Rust before you speak.

Rust makes no assumptions and is explicitly designed to support both single and multi threaded executors. You can have non-Send Futures.


I'm fully aware of this, thanks @iknowstuff.

>>>>> Typically promises are designed...

I'm merely saying Rust async is not restricted to single threaded like many other languages design their async to be, because most people coming from Node are going to assume async is always single threaded.

Most people who write their promise implementations make them Send so they work with Tokio or Async-Std.

Relax, my guy. The shitty tone isn't necessary.

EDIT: Ah, your entire history is you just arguing with people. Got it.


Issues with the article:

1. Only one example is given (web server), solved incorrectly for threads. I will elaborate below.

2. The question is framed as if people specifically want OS threads instead of async/await .

But programmers want threads conceptually, semantically. Write sequential logic and don't use strange annotations like "async". In other words, if async/await is so good, why not make all functions in the language implicitly async, and instead of "await" just write normal function calls? Then you will suddenly be programming in threads.

OS threads are expensive due to statically allocated stack, and we don't want that. We want cheap threads, that can be run in millions on a single CPU. But without the clumsy "async/await" words. (The `wait` word remains in it's classic sense: when you wait for an event, for another thread to complete, etc - a blocking operation of waiting. But we don't want it for function invocations).

Back to #1 - the web server example.

When timeout is implemented in async/await variant of the solution, using the `driver.race(timeout).await`, what happens to the client socket after the `race` signals the timeout error? Does socket remain open, remains connected to the client - essentially leaked?

The timeout solution for threaded version may look almost the same, as it looks for async/await: `threaded_race(client_thread, timeout).wait`. This threaded_race function uses a timer to track a timeout in parallel with the thread, and when the timeout is reached it calls `client_thread.interrupt()` - the Java way. (The `Thread.interrupt()`, if thread is not blocked, simply sets a flag; and if the thread is blocked in an IO call, this call throws an InterruptedException. That's a checked exception, so compiler forces programmer to wrap the `client.read_to_end(&mut data)` into try / catch or declare the exception in the `handle_client`. So programmer will not forget to close the client socket).


> When timeout is implemented in async/await variant of the solution, using the `driver.race(timeout).await`, what happens to the client socket after the `race` signals the timeout error?

Any internal race() values will be `Drop`ed and driver itself will remain (although rust will complain you are not handling the Result if you type it 'as is'), if a new socket was created local to the future it will be cleaned up.

The niceness of futures (in Rust) is that all the behavior around it can be defined, while "all functions are blocking." as you state in a sibling comment, Rust allows you to specify when to defer execution to the next task in the task queue, meaning it will poll tasks arbitrarily quickly with an explicitly held state (the Future struct). This makes it both very fast (compared to threads which need to sleep() in order to defer) and easy to reason about.

Java's Thread.interrupt is also just a sleep loop, which is fine for most applications to be fair. Rust is a system language, you can't have that in embedded systems, and it's not desirable for kernels or low-latency applications.


> Java's Thread.interrupt is also just a sleep loop

You probably mean that Java's socket reading under the hood may start a non-blocking IO operation on the socket, and then run a loop, which can react on Thread.interrupt() (which, in turn, will basically be setting a flag).

But that's an implementation detail, and it does not need to be implemented that way.

It can be implemented the same way as async/await. When a thread calls socket reading, the runtime system will take the current threads continuation off the execution, and use CPU to execute the next task in the queue. (That's how Java's new virtual threads are implemented).

Threads and async/await are basically the same thing.

So why not drop this special word `async`?


> So why not drop this special word `async`?

You can drop the special word in Rust it's just sugar for 'returns a poll-able function with state'; however threads and async/await are not the same.

You can implement concurrency any way you like, you can run it in separate processes or separate nodes if you are willing to put in the work, that does not mean they equivalent for most purposes.

Threads are almost always implemented preemptively while async is typically cooperative. Threads are heavy/costly in time and memory, while async is almost zero-cost. Threads are handed over to the kernel scheduler, while async is entirely controlled by the program('s executor).

Purely from a merit perspective threads are simply a different trade-off. Just like multi-processing and distributed actor model is.


> Threads are almost always implemented preemptively while async is typically cooperative. Threads are heavy/costly in time and memory, while async is almost zero-cost. Threads are handed over to the kernel scheduler, while async is entirely controlled by the program('s executor).

Keyword here being almost. See Project Loom.


@f_devd, cooperative vs preemptive is a good point.

(That threads are heavy or should be scheduled by OS is not required by the nature of the threads).

But preemptive is strictly better (safer at least) than cooperative, right? Otherwise, one accidental endless loop, and this code occupies the executor, depriving all other futures from execution.

@gpderetta, I think Project Loom will need to become preemptive, otherwise the virtual threads can not be used as a drop-in replacement for native threads - we will have deadlocks in virtual threads where they don't happen in native threads.


Preemptive is safer for liveliness since it avoids 'starvation' (one task's poll taking too long), however it in practice almost always more expensive in memory and time due to the implicit state.

In async, only the values required to do a poll need to be held (often only references), while for threads the entire stack & registers needs to be stored at all times, since at any moment it could be interrupted and it will need to know where to continue from. And since it needs to save/overwrite all registers at each context switch (+ scheduler/kernel handling), it takes more time overall.

In general threads are a good option if you can afford the overhead, but assuming threads as a default can significantly hinder performance (or make near impossible to even run) where Rust needs to.


@f_devd, I think you are mistaken.

Not that I want to discourage anyone from using async/await. I am glad async/await solves people problems, especially when people do not have a ready to use alternative as my perfect ideal threads.

But just to reduce the number of people who are mistaken in the Internet :)

I think the only real problem that makes threads really expensive for embedded systems is statically allocated large stack. If stack size is managed dynamically, it can be small thus allowing many threads. The other expenses should be tolerable. Embedded systems don't require high computational throughput, I think.

All implementation approaches used for async/await can be used for threads, and vice versa, because they are basically the same thing.

> In async, only the values required to do a poll need to be held (often only references), while for threads the entire stack & registers needs to be stored at all times, since at any moment it could be interrupted and it will need to know where to continue from.

Well, it seems opposite - the approach you attribute to threads can be more efficient here. If async function, when blocked, holds in its Feature state record only the part of local vars and parameters that is needed to continue execution, the function needs to copy them from the stack. And that's redundant copying and memory allocation for Feature state records. Note, this happens at every element of function call chain, so the Future state records act as stack frames. And this stack copying is most likely done in individual assignments, var by var.

And I am afraid this allocation and copying can happen every time the async function blocks. Reusing Future state records may be non-trivial, given that next time the top-level async function we are await'ing for may block in some other internal branch.

Compared to saving the stack which is just saving two registers: stack base and stack pointer.

> And since it needs to save/overwrite all registers at each context switch (+ scheduler/kernel handling), it takes more time overall.

Saving registers is cheap. Also there is no magic, when next async function is activated by async function scheduler, it uses the registers as it wants, so register values of previously blocked async function need to be saved somehow - this happens when the most nested function copies it local vars to the Future state record.

Speaking of preemption requiring kernel - not necessarily. It can be done in user space. A thread can yield control to scheduler when it invokes a blocking function (as Java virtual threads currently do). In addition to that, other preemption points can be used - function calls, allocations, maybe loop boundaries. This approach lies in between the cooperative threading and full preemption.

If we consider preemption by timer interrupts. First, it only happens if the thread haven't yet yielded control by calling a blocking function. Second, if preemption by timer happens, kernel can pass control to the user space scheduler in the application runtime instead of applying kernel's heavy weight scheduler (is kernel scheduler really more heavy weight?).

Moreover, I've just searched for user space interrupts, and it looks like new processors provide such a feature. The first link in search currently is https://lwn.net/Articles/871113/. Green threads scheduling is mentioned as one of the use cases.

So, in short, I don't see why threads would be inherently less performant than async/await.


I think you might be confusing Runtime, OS and bare-metal primitives. Java virtual threads are possible because there is always the runtime which code will return to, and since it's already executing in a VM the concept of Stack/Heap Store/Loads don't really matter for performance.

> Compared to saving the stack which is just saving two registers: stack base and stack pointer.

In embedded you might not have a stack base, just a stack pointer, this means in order to switch to a different stack you need to copy 2 stacks. (I might be wrong here; I know some processors have linear stacks, but this might be more uncommon).

On bare metal this dynamic changes significantly, in order to "switch contexts" with preemption the following steps are needed (omitting the kernel switch ops):

- Receive interrupt

- Mask interrupts

- Store registers to heap

- Store stack to heap

- Calculate next preemption time (scheduler)

- Set interrupt for next preemption time

- Load stack from heap

- Load registers from heap

- Unmask interrupts

- Continue execution using program counter

While for async/await everything already in place on the stack/heap so a context switch is:

- Call Future.poll function

- If Poll::Ready, make parent task new Future and (if it exists) call it

- If Poll::Pending, go to next Future in Waker queue

Async/await (in rust) is without a runtime, and without copies or register stores/loads; it can be implemented on any cpu. On embedded tasks can also decide how they want to be woke, so if you want to do low-power operation you can make an interrupt which calls `wake(future)` and it will only poll that task after the interrupt has hit, meaning any time the Waker queue is empty it knows it can sleep with interrupts enabled.

> so register values of previously blocked async function need to be saved somehow

The difference is that we know exactly which values are needed instead of not knowing what we need from the stack/registers.

User-space interrupts would make it easier to do preemption in user-space but this is yet another feature you can't make assumptions about (especially since there has been only a single gen of processors which support it).


Yes, of course a non-cooperative switch is more expensive than a cooperative one. But the thread model does not require preemption or even time-slice scheduling.

But with async/await cooperative switch is the only option.


I'm unfamiliar with a bare-metal thread model that doesn't do preemption outside of a Runtime. I imagine you'd need to effectively inject code to do a cooperative switch as there aren't many ways for a cpu to exit it's current 'task' outside of an interrupt (pre-emption) or a defer call (cooroutines/async). For Runtimes it usually also means you effectively have a cooperative switch but it's hidden away in runtime code.

Do you have an example?


@f_devd, I realized that my main objection to async/await does not apply to Rust.

Thank you for staying in the discussion long enough for me to realize that completely.

I dislike async/await in Javascript because async functions can not be called synchronously from normal functions. The calling function and all its callers and all their callers need to turned async.

In Rust, since we can simply do `executor::block_on(some_async_functino())`, my objection goes away - all primitives remain fully composable. Async functions can call usual functions and vice versa.

So my first comment was to some extend a "knee-jerk reaction".

As we started to discuss thread preemption cost, I will provide some responses below. In short, I believe it can be on par with async/await.

=================================================

> I think you might be confusing Runtime, OS and bare-metal primitives.

I am not confusing, but I consider all those cases down to what happens at CPU level.

> Java virtual threads are possible because there is always the runtime which code will return to, and since it's already executing in a VM the concept of Stack/Heap Store/Loads don't really matter for performance.

They remain applicable, as at the lowest level the VM / Runtime is executed by a CPU.

> Async/await (in rust) is without a runtime,

Rust Executor is a kind of runtime, IMHO.

> and without copies or register stores/loads;

The CPU register values are still saved to memory when async function returns Poll::Pending, so that the intermediate computation results are not lost and when polled again the function continues its execution correctly. (On the level of Rust source code, the register saving corresponds to assignment of local variables of the most nested async function to the fields of the generated anonymous future).

==============================================

> In embedded you might not have a stack base, just a stack pointer, this means in order to switch to a different stack you need to copy 2 stacks. (I might be wrong here; I know some processors have linear stacks, but this might be more uncommon).

If the CPU does not have a stack base (stack segment register), saving of the stack pointer is enough to switch to another stack.

In practice, I think, even CPUs with stack segment register, most often only need to save stack pointer for context switch - all stacks of the process can live in the same segment, and even for different processes the OS can arrange the segments to have the equal segment selector. I know that switching to kernel mode usually involves changing stack segment register in addition to the stack pointer (as the kernel stack segment has different protection level).

==============================================

> On bare metal this dynamic changes significantly, in order to "switch contexts" with preemption the following steps are needed (omitting the kernel switch ops): [...] While for async/await everything already in place on the stack/heap so a context switch is: [..]

The operations you listed for bare metal are very cheap, some items in the list are just single CPU instruction. (Also, I think timer interrupts are configured once for periodic interval and don't need to be recalculated and set on every context switch).

If one expands the "go to next Future in Waker queue" item you listed for async/await in the same level of detail that you did for bare metal, the resulting list may be even longer than the bare metal list.

==============================================

The majority of the context switch cost at CPU level is when we switch to different process, so that new virtual memory mapping table needs to be loaded to the CPU, (and correspondingly, the cached mappings in TLB needs to be reset and new ones need to be computed during execution in the new context); from the need to load different descriptor tables.

Nothing of that applies to in-process green thread context switches.


Java can afford that. M:N threads come with a heavy runtime. Java has already a heavy runtime, so what is a smidgen more flab?

Source: https://github.com/rust-lang/rfcs/blob/master/text/0230-remo...


So it seems that the biggest issue was having a single Io interface forcing overhead on both green and native threads and forcing runtime dispatching.

It seems to me that the best would have been to have the two libraries evolve separately and capture the common subset in a trait (possibly using dynamic impl when type erasure is tolerable), so that you can write generic code that can work with both or specialized code to take advantage of specific features.

As it stand now, sync and async are effectively separated anyway and it is currently impossible to write generic code that hande both.


> But programmers want threads conceptually, semantically. Write sequential logic and don't use strange annotations like "async". In other words, if async/await is so good, why not make all functions in the language implicitly async, and in stead of "await" just use normal function calls? Then you will suddenly be programming in threads.

Some programmers do, but many want exactly the opposite as well. Most of the time I don't care if it's an OS blocking syscall or a non-blocking one, but I do care about understanding the control flow of the program I'm reading and see where there's waiting time and how to make them run concurrently.

In fact, I'd kill to have a blocking/block keyword pair whenever I'm working with blocking functions, because they can surreptitiously slow down everything without you paying attention (I can't count how many pieces of software I've seen with blocking syscalls in the UI thread, leading to frustratingly slow apps!).


This is a really common comment to see on HN threads about async/await vs fibers/virtual threads.

What you're asking for is performance to be represented statically in the type system. "Blocking" is not a useful concept for this. As avodonosov is pointing out, nothing stops a syscall being incredibly fast and for a regular function that doesn't talk to the kernel at all being incredibly slow. The former won't matter for UI responsiveness, the latter will.

This isn't a theoretical concern. Historically a slow class of functions involved reading/writing to the file system, but in some cases now you'll find that doing so is basically free and you'll struggle to keep the storage device saturated without a lot of work on multi-threading. Fast NVMe SSDs like found in enterprise storage products or MacBooks are a good example of this.

There are no languages that reify performance in the type system, partly because it would mean that optimizing a function might break the callers, which doesn't make sense, and partly because the performance of a function can vary wildly depending on the parameters it's given.

Async/await is therefore basically a giant hack and confuses people about performance. It also makes maintenance difficult. If you start with a function that isn't async and suddenly realize that you need to read something from disk during its execution, even if it only happens sometimes or when the user passes in a specific option, you are forced by the runtime to mark the function async which is then viral throughout the codebase, making it a breaking change - and for what? The performance impact of reading the file could easily be lost in the noise on modern hardware and operating systems.

The right way to handle this is the Java approach (by pron, who is posting in this thread). You give the developer threads and make it cheap to have lots of them. Now break down tasks into these cheap threads and let the runtime/OS figure out if it's profitable to release the thread stack or not. They're the best placed to do it because it's a totally dynamic decision that can vary on a case-by-case basis.


> It also makes maintenance difficult. If you start with a function that isn't async and suddenly realize that you need to read something from disk during its execution, even if it only happens sometimes or when the user passes in a specific option, you are forced by the runtime to mark the function async which is then viral throughout the codebase, making it a breaking change - and for what? The performance impact of reading the file could easily be lost in the noise on modern hardware and operating systems.

You'll typically have an idea of whether or not a function performs IO from the start. Changing that after the fact violates the users' conceptual model and expectation of it, even if all existing code happens to keep working.


If you want to go full Haskell on the problem for purity-related reasons, by all means be my guest. I strongly approve.

However, unless you're in such a language, warping my entire architecture around that objection does not provide a good cost-benefit tradeoff. I've got a lot of fish to fry and a lot of them are bigger than this in practice. Heck, there's still plenty of programmers who will consider it as unambiguous feature that they can add IO to anything they want or need to and consider it a huge negative when they can't, and a lot of programmers who don't practice an IO isolation and don't even conceive of "this function is guaranteed to not do any IO/be impure" as a property a function can have.


In any system that uses mmap or swap it's a meaningless distinction anyway (which is obviously nearly all of them outside of embedded RTOS). Accessing even something like the stack can trigger implicit/automatic IO of arbitrary complexity, so the concept of a function that doesn't do IO is meaningless to begin with. Async/await isn't justified by any kind of interesting type theory, it exists to work around limitations in runtimes and language designs.


This argument is dull, nothing in programming can do anything perfectly: is catching exception is useless because the program can be killed by the OS? Is static typing pointless because cosmic rays can make your data ill-formed anyway?

All abstractions are leaky, and the OS and hardware's behaviors are always going to surface in ways that you cannot model in your programming language, no matter how low-level you want to go (asm itself is a poor abstraction on top of how the CPU actually works), but that doesn't make them useless.

Async/await is a way to communicate intent between developers on the same project, and between a dependency and its dependent. Exactly like static types, error as return values and non-nullable types. And like all of them while it doesn't prevent all bugs, it definitely helps.

The fact that it makes straightforward to implement the best possible performance is just the cherry on top.


> You'll typically have an idea of whether or not a function performs IO from the start.

I think GP's point is: why does that matter? Much writing on Async/Await roughly correlates IO with "slow". GP rightly points out that "slow" is imprecise, changes, means different things to different people and/or use cases.

I completely get the intuition: "there's lag in the [UI|server|...], what's slowing it down?". But the reality is that trying to formalise "slow" in the type system is nigh on impossible - because "slow" for one use case is perfectly acceptable for another.


While slow in absolute depends on lots of factors, the relative slowness of things doesn't so much. Whatever the app or the device, accessing a register is always going to be faster than random places in RAM, which is always going to be faster than fetching something on disk and even moreso if we talk about fetching stuff over the network. No matter how hardware progresses, latency hierarchy is doomed to stay.

That doesn't mean it's the only factor of slowness, and that async/await solves all issues, but it's a tool that helps, a lot, to fight against very common sources of performance bugs (like how the borrow checker is useful when it protects against the nastiest class of memory vulnerabilities, even if it cannot solve all security issues).

Because the situation where “my program is stupidly waiting for some IO even though I don't even need the result right now and I could do something in the meantime” is something that happens a lot.


> Whatever the app or the device, accessing a register is always going to be faster than random places in RAM, which is always going to be faster than fetching something on disk and even moreso if we talk about fetching stuff over the network.

The network is special: the time it takes to fetch something over the network can be arbitrarily large, or even infinite (this can also apply to disk when running over networked filesystems), while for registers/RAM/disk (as long as it's a local disk which is not failing) the time it takes is bounded. That's the reason why async/await is so popular when dealing with the network.


PCIe is a network. USB is a network. There is no such thing as a resource with a guaranteed response time.


Even if you ignore performance completely, IO is unreliable. IO is unpredictable. IO should be scrutinized.


When using SSD or eNVM, why would local IO be more unreliable/unpredictable than local RAM?


Don't assume your process is the only process. You have to share resources like storage and RAM with everything else on the system. Just a single, simple Java app can gobble up all available RAM if you don't tell it otherwise.


Exactly. Which means other processes running on the same server can cause latency on disk access but also on RAM, which was the point I was trying to make :)


There are plenty of language ecosystems where there's no particular expectations up front about whether or when a library will do IO. Consider any library that introduces some sort of config file or registry keys, or an OS where a function that was once purely in-process is extracted to a sandboxed extra process.


> There are plenty of language ecosystems where there's no particular expectations up front about whether or when a library will do IO.

There are languages that don't enforce the expectation on a type level, but that doesn't mean that people don't have expectations.

> Consider any library that introduces some sort of config file or registry keys

Yeah, please don't do this behind my back. Load during init, and ask for permission first (by making me call something like Config::load() if I want to respect it).

> or an OS where a function that was once purely in-process is extracted to a sandboxed extra process.

Slightly more reasonable, but this still introduces a lot of considerations that the application developer needs to be aware of (how should the library find its helper binary? what if the sandboxing mechanism fails or isn't available?).


For the sandbox example I was thinking of desktop operating systems where things like file IO can become brokered without apps being aware of it. So the API doesn't change, but the implementation introduces IPC where previously there wasn't any. In practice it works fine.


> There are no languages that reify performance in the type system,

Async/await is a way of partially doing... just that, but without a having to indicate what is "blocking", and if an async function blocks, well, you'll be unhappy, so don't do that. For a great deal of things this is plenty good enough, but from a computer science perspective it's deeply unsatisfying because one would want the type system to prevent making such mistakes.

At least with async/await an executor could start more threads when an async thread makes a known-blocking call, thus putting a band-aid on the problem.

Perhaps the compiler could reason about the complexity of code (e.g., recursion and nested loops w/ large or unclear bounds -> accidentally quadratic -> slow -> "blocking") and decide if a pure function is "blocking" by dint of being slow. File I/O libraries could check if the underlying devices are local and fast vs. remote and slow, and then file I/O could always be async but completing before returning when the I/O is thought to be fast. This all feels likely to cause more problems than it solves.

If green threads turn out not to be good enough then it's easier to accept the async/await compromise and reason correctly about blocking vs. not.


A function being marked async tells you nothing in particular about the performance of that operation. It could be anything from seconds to milliseconds and routinely is e.g. elliptic curve cryptography operations that are plenty fast enough to execute on a UI thread animating at 60fps are nonetheless marked async on the web, whilst attaching a giant document fragment to the live DOM - which might trigger very intensive rerendering calculations - isn't.

Nothing stops you from starting more threads when you run low in a scenario when there are only threads also. That's how the JVM ForkJoinPool works. If your threads end up all blocked, more are started automatically.


You can't encode everything about performance in the type system, but that doesn't mean you cannot do it at all: having a type system that allows you to control memory layout and allocation is what makes C++ and Rust faster than most languages. And regarding what you say about storage access: storage bandwidth is now high, but latency when accessing an SSD is still much higher than accessing RAM, and network is even worse. And it will always be the case no matter what progress hardware makes, because of the speed of light.

Saying that async/await doesn't help with all performance issues is like saying Rust doesn't prevent all bugs: the statement is technically correct, but that doesn't make it interesting.

> Async/await is therefore basically a giant hack and confuses people about performance. It also makes maintenance difficult.

Many developers have embraced the async/await model with delight, because it instead makes maintenance easier by making the intent of the code more explicit.

It's been trendy on HN to bash async/await, but you are missing the mist crucial point about software engineering: code is written for humans and is read much more than written. Async/await may be slightly more tedious to write (it's highly context dependent though, when you have concurrent tasks to execute or need cancellation, it becomes much easier with futures).

> The right way to handle this is the Java approach (by pron, who is posting in this thread)

No it's not, and Mr Pressler's has repeatedly shown that he misses the social and communication aspects, so it's not entirely surprising.


> It's been trendy on HN to bash async/await, but you are missing the mist crucial point about software engineering: code is written for humans and is read much more than written.

Readability can be in the eye of the beholder. Some of us find it easier to read concurrent code using goroutines and channels in Go, or processes and messages in Erlang, than async/await in Rust.


I don't believe you. And you know why? Because because what the async syntax permit is just a superset of the capabilities of goroutines: Any goroutine-based code can be rewritten into async/await, without changing anything of the structure: in practice all it would change is that it would add .await at yield points. Yes, async Rust code in fact resembles a lot the code you're used to, with “tasks” (which are functionally goroutines) and channels (just have a look at tokio's documentation[1] to see how it looks like).

But for some situations, in addition to the goroutine style, async/await allows you to have a much more straightforward implementation (cancellation, timeout, error handling, etc.). Even then, there's nothing stopping you from using channels and select for these use-case when using Rust async/await. (In practice people don't because nobody sane would use channels and select to implement a timeout instead of tokio::timeout, but you can).

Saying that goroutine and channel are easier to read than async/await Rust is like saying your car can drive slower than mine, and it just reveals that you haven't actually ever read async rust code and you're just imagining how bad it must be because of some prejudice of yours.

[1]: https://tokio.rs/tokio/tutorial/channels


You're writing that "code is written for humans" (very true) and that async/await is more readable, and when you receive feedback from other humans they find async/await slightly less readable than fibers/green threads, you dismiss the feedback as unfaithful. That's quite often the problem with arguments about readability: they devolve into personal opinions. Perhaps you're right, perhaps if I'd spend much more time reading async/await code, perhaps I'll eventually find it as readable as fibers code. But that's not the case today. Others in this discussion have shared a similar feedback. To be clear, that readability "issue" is not a big deal for me, just something that I perceive as a small obstacle.


> You're writing that "code is written for humans" (very true) and that async/await is more readable, and when you receive feedback from other humans they find async/await slightly less readable than fibers/green threads, you dismiss the feedback as unfaithful.

Yes, because it is. Async await can express exactly the same thing as goroutines without any alteration in structures, so by definition it cannot be “less readable” because the code is exactly the same. But, at the same time, for specific topics where goroutines code is objectively[1] not optimal, async/await offers additional tools.

Literally Anything that is straightforward with goroutines, is going to be straightforward with async/await, because your are just going to write the exact same code (really, go ahead and paste any Go code here and you'll see that the async Rust translation is going to be identical to a thread based translation, with only a small amounts “async” and “await” keywords added here and there).

> That's quite often the problem with arguments about readability: they devolve into personal opinions.

The problem here is that you're having an opinion on something you don't know about, and you're talking purely out of prejudice. And of course it's a big “personal opinion” problem.

> Perhaps you're right, perhaps if I'd spend much more time reading async/await code, perhaps I'll eventually find it as readable as fibers code. But that's not the case today.

That's not my point at all! My point is that FOR PRETTY MUCH EVERYTHING IT'S GOING TO BE EXACTLY THE SAME CODE, just with materialized yield points! And only for some stuff that is hard and tedious to write with goroutines (think cancellation, timeouts), you'd be able to use a different syntax with async/await (that is, it's going to be a function call instead of having to use channels + select).

And that's why async/await can be “easier to read” while not being “harder to read”, simply because it has the expressing power to express the exact same things, and the power to express a few other things in a better way.

[1]: (yes I say “objectively”, because writing 20 lines of concurrent code with channels and select for something that can be a simple function call is arguably inferior).


> Async await can express exactly the same thing as goroutines without any alteration in structures, so by definition it cannot be “less readable” because the code is exactly the same.

I agree that in many cases the "structure" is exactly the same, but the code is not, otherwise we wouldn't have to add `async` and `await` and adjust types for async.

> with only a small amounts “async” and “await” keywords added here and there

It's not just about the async and await keywords. It's also about adjusting the types, thinking differently about parallelism (not concurrency), and a few other things.

> My point is that FOR PRETTY MUCH EVERYTHING IT'S GOING TO BE EXACTLY THE SAME CODE, just with materialized yield points!

Then that's not the same code. This is syntactically and semantically different. And that's exactly what makes async/await interesting and useful. The screaming case wasn't necessary.

> it has the expressing power

Increased expressive power doesn't always help with increased readability. To continue that discussion in a constructive way, we would need to agree on a definition of readability and how to assess it objectively, because I'm not sure we're talking about the same thing. But I lost interest in doing so considering the general tone of the comments.


> It's not just about the async and await keywords. It's also about adjusting the types, thinking differently about parallelism (not concurrency), and a few other things.

No, you're overthinking it. The way concurrency works with async await is exactly the same as the way it works with green thread[1]. It's a bit different from what happens with OS threads but at the same time the difference between OS threads and green threads doesn't seem particularly impactful for you since you are happy to ignore it when talking about goroutines.

> Then that's not the same code. This is syntactically and semantically different. And that's exactly what makes async/await interesting and useful.

It's syntactically a bit different, but not semantically, at least not in a meaningful way: Rust async vs JS async is more semantically different than old Go[2] and rust, and old Go was more different to Go now than it was to Rust. The syntax difference is what makes it the most different, because it allows for terser constructs in a few cases (this is were the readability benefits kick in, but it's limited in scope). But that's basically the same difference as Rust error handling vs Go's (it works functionally the same, but Rust's approach allow for the terser `?` syntax that was added a few years after 1.0).

> The screaming case wasn't necessary.

Sorry about that, it wasn't about screaming and more about working around the lack of bold emphasis in HN comment formatting but I see how it can be misinterpreted as aggressive.

> Increased expressive power doesn't always help with increased readability.

If language A has more expressing power than language B, it means that the same developers can express things in language A the same way he would for language B. That is, his code would be no less readable when written in language A than if it was written in language B. Of course that says nothing about cultural differences between subgroups of developers and how some people in one language could write code that is less readable than other people in another language, but this has nothing to do with the language itself. (Bad developers write unreadable code in any languages and Go is a good testament that limiting the expressing power of one language isn't a good way to improve readability over the board as Go's leadership eventually recognized).

> To continue that discussion in a constructive way, we would need to agree on a definition of readability and how to assess it objectively

That's not something we can easily define formally, but we could work from code samples, like I suggested above.

[1] at least for cooperatively scheduled green threads, which is what go did for almost a decade. If you go back to the time where go had segmented stacks, then it's literally the exact same semantics in every way including memory layout.

[2] by “old Go” I mean Go before they introduced a preemptive scheduler, which happened a few years ago.


But all functions are blocking.

   fn foo() {bar(1, 2);}
   fn bar(a, b) {return a + b;}
Here bar is a blocking function.


Difference is in quantities. bar blocks for nanoseconds, blocking that the GP talks about affects the end user, which means it's in seconds.


No they aren't, and that's exactly my point.

Most functions aren't doing any syscall at all, and as such they aren't either blocking or non-blocking.

Now because of path dependency and because we've been using blocking functions like regular functions, we're accustomed to think that blocking is “normal”, but that's actually a source of bugs as I mentioned before. In reality, async functions are more “normal” than regular functions: they don't do anything fancy, they just return a value when you call them, and what they return is a future/promise. In fact you don't even need to use any async anotation for a function to be async in Rust, this is an async function:

    fn toto() -> impl Future<Output = String>> {
        unimplemented!();
    }

The async keyword exists simply so that the compiler knows it has to desugar the await inside the function into a state machine. But since Rust has async blocks it doesn't even need async on functions at all, the information you need comes from the type of the return value, that is a future.

Blocking functions, on the contrary, are utterly bizarre. In fact, you cannot make one yourself, you must either call another blocking function[1] or do a system call on your own using inline assembly. Blocking functions are the anomaly, but many people miss that because they've lived with them long enough to accept them as normal.

[1] because blockingness is contagious, unlike asynchronousness which must be propagated manually, yes ironically people criticizing async/await get this one backward too


"makes certain syscalls" is a highly unconventional definition of "blocking" that excludes functions that spin wait until they can pop a message from a queue.

If your upcoming systems language uses a capabilities system to prevent the user from inadvertently doing things that may block for a long time like calling open(2) or accessing any memory that is not statically proven to not cause a page fault, I look forward to using it. I hope that these capabilities are designed so that the resulting code is more composable than Rust code. For example it would be nice to be able to use the Reader trait with implementations that source their bytes in various different ways, just as you cannot in Rust.


Blocking syscalls are a well defined and well scoped class of problems, sure there are other situations where the flow stops and a keyword can't save you from everything.

Your reasoning is exactly similar to the folks who say “Rust doesn't solve all bugs” because it “just” solve the memory safety ones.


I may be more serious than you think. Having worked on applications in which blocking for multiple seconds on a "non-blocking syscall" or page fault is not okay, I think it would really be nice to be able to statically ensure that doesn't happen.


I'm not disputing that, in the general case I suspect this is going to be undecidable, and that you'd need careful design to carve out a subset of the problem that is statically addressable (akin to what rust did for memory safety, by restricting the expressiveness of the safe subset of the languages).

For blocking syscalls alone there's not that much PL research to do though and we could get the improvement practically for free, that's why I consider them to be different problems (also because I suspect they are much more prevalent given how much I've encountered them, but it could be a bias on my side).


Any function can block if memory it accesses is swapped out.


bar blocks waiting for the CPU to add the numbers.


Nope it doesn't, in the final binary the bar function doesn't even exist anymore, as the optimizer inlined it, and CPUs have been using pipelining and speculative execution for decades now, they don't block on single instruction. That's the problem with abstractions designed in the 70s, they don't map well with the actual hardware we have 50 years after…


I don't know what to tell you, but that is how sequential code works. Sure you can find some instruction level parallelism in the code and your optimizer may be able to do it across function boundaries, but that is mostly a happy accident. Meanwhile HDLs are the exact opposite. Parallel by default and you have to build sequential execution yourself. What is needed for both HLS and parallel programming is a parallel by default hybrid language that makes it easy to write both sequential and parallel code.


Except, unless you're using atomics or volatiles, you have no guaranties that the code you're writing sequentially is going to be executed this way…


Sure, unless it is the first time you are executing that line of code and you have to wait for the OS to slowly fault it in across a networked filesystem.


Make `a + b` `A * B` then, multiplication of two potentially huge matrices. Same argument still holds, but now it's blocking (still just performing addition, only an enormous number of times).


It's not blocking, it's doing actual work.

Blocking is the way used by the old programming paradigm to deal with asynchronous actions, and it works by behaving the same way as when the computer actually computes thing, so that's where the confusion comes from. but the two situations are conceptually very different: in one case, we are idle (but don't see it), in another case we're busy doing actual work. Maybe in case 2. we could optimize the algorithm so that we spend more time, but that's not sure, whereas in case 1. there's something obvious to do to speed things up: do something at the same time instead of waiting mindlessly. Having a function marked async gives you a pointer that you can actually run it concurrently to something else and expect speed up, whereas with blocking syscall there's no indication in the code that those two functions you're calling next to each other with not data dependency between them would gain a lot to be run concurrently by spawning two threads.

BTW, if you want something that's more akin to blocking, but at a lower level, it's when the CPU has to load data from RAM: it's really blocked doing nothing useful. Unfortunately that's not something you can make explicit in high-level languages (or at least, the design space hasn't been explored) so when these kinds of behavior matters to you, that's when you dive to assembly.


A "non-blocking function" always meant "this function will return before its work is done, and will finish that work in the background through threads/other processes/etc". All other functions are blocking by default, including that simple addition "bar" function above.


Your definition is at odds with (for instance) JavaScript's implementation of a non-blocking function though, as it will perform computation until the first await point before returning (unlike Rust future which are lazy, that is: do no work before they are awaited on).

As I said before, most of what you call a “blocking function” is actually a “no opinion function” but since in the idiosyncrasy of most programming languages blocking functions are being called like “no opinion” ones, you are mixing them up. But it's not a fundamental rule. You could imagine a language where blocking functions (which contains an underlying blocking syscall) are being called with the block keyword and where regular functions are just called like functions. There's no relation between regular functions and blocking functions except path dependency that led to this particular idiosyncrasy we live in, it is entirely contingent.


> Your definition is at odds with (for instance) JavaScript's implementation of a non-blocking function though, as it will perform computation until the first await point before returning

Yes, that's syntactic sugar for returning a promise. This pattern is something we've long called a non-blocking function in Javascript. The first part that's not in the promise is for setting it up.


If you define a non-blocking function to be what you decide is non-blocking, that's a bit of cheating don't you think? ;)

How about this function:

    async fn toto( input: u8) -> bool {
      if input % 2 == 0 {
        true
      } else{
        false
      }
    }
Is it a non-blocking one or not according to your criteria?


We were just taking Javascript and now you're back to Rust? That same function with that same keyword acts differently in the two languages.

My best guess is you're defining it (mostly?) by the syntax while I'm defining it by how the function acts. By what I'm talking about, that's a non-blocking function in Rust, but written the exact same way in Javascript it's a blocking function.


> We were just taking Javascript and now you're back to Rust? That same function with that same keyword acts differently in the two languages.

Yes, that's the point, and they in both case they are being called non-blocking functions, despite behaving differently.

> My best guess is you're defining it (mostly?) by the syntax while I'm defining it by how the function acts

I'm defining them how they are commonly being referred to, whereas you're using some arbitrary criteria that is not even consistent, as you'll see below.

> By what I'm talking about, that's a non-blocking function in Rust, but written the exact same way in JavaScript it's a blocking function.

Gotcha!

If we get back to your definition above

> A "non-blocking function" always meant "this function will return before its work is done, and will finish that work in the background through threads/other processes/etc".

Then it should be a “blocking function” because what this function returns is basically a private enum with two variants and a poll method to unwrap them. There's nothing running in the background ever.

In fact, in Rust an async function never runs anything in the background: all it does is returning a Future which is a passive state machine, there's no magic in there, and it's not very different from a closure or an Option actually (in fact, in this example, it's practically an the same as an Option). Then you can send this state machine to an executor that will do the actual work, polling it to completion. But this process doesn't necessarily happens in the background (you can even block_on the future to make that execution synchronous).

So in reality there are two kinds of functions, the ones that returns immediately (async and regular functions) and the ones that block the execution and will return later on (the “blocking” functions). And among the two kinds of functions that do not block, there's also two flavors: the ones that returns results that are immediately ready, and the ones that returns results that won't be ready before some time.


..you're referring to features of the future as if they're instead features of the function?


I don't think I understand what you're asking.


Blocking on the future isn't a feature of the function that you asked about, and in fact as a comparison to javascript if you expand the example to include an `await` on the returned promise (and include something that's actually async) it also blocks to wait for a result in the same thread. Those are features of the future/promise, not features of the function.


I'm still unsure of what you mean. Of course blocking on the future is a property of the function! The future/promise can exist everywhere, including in non-async functions. The key difference with async functions is that you can block the execution flow on await and only them can do that. Futures/promises cannot by themselves.

But calling await on a future doesn't necessarily mean it will in fact block the flow of execution, it can also return the value instantly (which means it doesn't fit your previous definition of non-blocking function), but it's still a special function that needs to be handled like any other non-blocking function (you can't just call it to get the result, you need to “unwrap” the future by calling await on it).

What I pointed out is that your definition of the split between non-blocking and blocking function is inconsistent and the reason why it is is because you fundamentally get the non-blocking/blocking difference backwards. And that's not your fault actually, the syntax used in pretty much every language leads to this confusion if you don't take a step back and think about it.


> In other words, if async/await is so good, why not make all functions in the language implicitly async, and instead of "await" just write normal function calls? Then you will suddenly be programming in threads.

It has been tried various times in the last decades. You want to search for "RPC". All attempts at trying to unify sync and async have failed, because there is a big semantical difference between running code within a thread or between threads or even between computers. Trying to abstract over that will eventually be insufficient. So better learn how to do it properly from the beginning.


I think you've got some of this in your own reply, but ... I feel like Erlang has gone all in on if async is good, why not make everything async. "Everything" in Erlang is built on top of async messaging passing, or the appearance thereof. Erlang hasn't taken over the world, but I think it's still successful; chat services descended from ejabberd have taken over the world; RabbitMQ seems pretty popular, too. OTOH, the system as a whole only works because Erlang can be effectively preemptive in green threads because of the nature of the language. Another thing to note is that you can build the feeling of synchronous calling by sending a request and immediately waiting for a response, but it's vary hard to go the other way. If you build your RPC system on the basis of synchronous calls, it's going to be painful --- sometimes you want to start many calls and then wait for the responses together, that gets real messy if you have to spawn threads/tasks every time.


I'm not very familiar with Erlang, but from my understanding, Erlang actually does have this very distinction - you either run local code or you interact with other actors. And here the big distinction gets quite clear: once you shoot a message out, you don't know what will happen afterwards. Both you or the other actor might crash and/or send other messages etc.

So Erlang does not try to hide it, instead, it asks the developer to embrace it and it's one of its strength.

That being said, I think that actors are a great way to model a system from the birds-perspective, but it's not so great to handle concurrency within a single actor. I wish Erlang would improve here.


Actors are a building block of concurrency. IMHO, it doesn't make sense to have concurrency within an actor, other than maybe instruction level concurrency. But that's very out of scope of Erlang, BEAM code does compile (JIT) to native code on amd64 and arm64, but the JIT is optimized for speed, since it happens at code load time, it's not an profiling/optimizing JIT like Java's hotspot. There's no register scheduler like you'd need to achieve concurrency, all the beam ops end up using the same registers (more or less), although your processor may be able to do magic with register renaming and out of order operations in general.

If you want instruction level concurrency, you should probably be looking into writing your compute heavy code sections as Native Implemented Functions (NIFs). Let Erlang wrangle your data across the wire, and then manipulate it as you need in C or Rust or assembly.


> IMHO, it doesn't make sense to have concurrency within an actor, other than maybe instruction level concurrency

I think it makes sense to have that, including managing the communication with other actors. Things like "I'll send the message, and if I don't hear back within x minutes, I'll send this other message".

Actors are very powerful and a great tool to have at your disposal, but often they are too powerful for the job and then it can be better to fall back to a more "low level" or "local" type of concurrency management.

At least that's how I feel. In my opinion you need both, and while you can get the job done with just one of them (or even none), it's far from being optimal.

Also, what you mention about NIFs is good for a very specific usecase (high performance / parallelism) but concurrency has a broader scope.


> Things like "I'll send the message, and if I don't hear back within x minutes, I'll send this other message".

I assume you don't want to wait with a x minute timeout (and meantime not do anything). You can manage this in three ways really:

a) you could spawn an actor to send the message and wait for a response and then take the fallback action.

b) you could keep a list (or other structure, whatever) of outstanding messages and timeouts, and prune the list if you get a response, or otherwise periodically check if there's a timeout to process.

c) set a timer and do the thing when you get the timer expiration message, or cancel the timer if you get a response. (which is conceptually done by sending a message to the timer server actor, which will send you a timer handle immediately and a timer expired message later; there is a timer server you can use through the timer module, but erlang:send_after/[2,3] or erlang:start_timer/[3,4] are more efficient, because the runtime provides a lot of native timer functionality as needed for timeouts and what not anyway)

Setting up something to 'automatically' do something later means asking the question of how is the Actor's state managed concurrently, and the thing that makes Actors simple is by being able to answer that the Actor always does exactly one thing at a time, and that the Actor cannot be interrupted, although it can be killed in an orderly fashion at any time, at least in theory. Sometimes the requirement for an orderly death means it may mean an operation in progress must finish before the process can be killed.


Exactly. Now imagine a) is unessarily powerful. I don't want to manage my own list as in b), but other than that, b) sounds fine and c) is also fine, though, does it need an actor in the background? No.

In other words, having a well built concept for these cases is important. At least that's my take. You might say "I'll just use actors and be fine", but for me it's not sufficient.


Oh and just to add onto it, I think async/await is not really the best solution to tackle these semantic difference. I prefere the green-thread-IO approach, which feels a might more heavy but it leads to a true understanding how to combine and control logic in a concurrent/parallel setting. Async/await is great to add it to languages that already have something like promises and want to improve syntax in an easy way though, so it has its place - but I think it was not the best choice for Rust.


> In other words, if async/await is so good, why not make all functions in the language implicitly async, and instead of "await" just write normal function calls?

IIRC withoutboats said in one of the posts that the true answer is compatibility with C.


There's also writing your code with poll() and select(), which is its own thing.


well that's the great thing with async rust: you write with poll and select without writing poll and select. let the computer and the compiler get this detail out of my way (seriously I don't want to do the fd interest list myself).

and I can still write conceptually similar select code using the, well, select! macro provided by most async runtimes to do the same on a select list of futures. better separation, easier to read, and overall it boils down to the same thing.


>In other words, if async/await is so good, why not make all functions in the language implicitly async, and instead of "await" just write normal function calls? Then you will suddenly be programming in threads.

Why not go one step further and invent "Parallel Rust"? And by parallel I mean it. Just a nice little keyword "parallel {}" where every statement inside the parallel block is executed in parallel, the same way it is done in HDLs. Rust's borrow checker should be able to ensure parallel code is safe. Of course one problem with this strategy is that we don't exactly have processors that are designed to spawn and process micro-threads. You would need to go back all the way to Sun's SPARC architecture for that and then extend it with the concept of a tree based stack so that multiple threads can share the same stack.


> Just a nice little keyword "parallel {}" where every statement inside the parallel block is executed in parallel, the same way it is done in HDLs. Rust's borrow checker should be able to ensure parallel code is safe.

The rayon crate lets you do something quite similar.


I believe the answer is "that implies a runtime", and Rust as a whole is not willing to pull that up into a language requirement.

This is in contrast to Haskell, Go, dynamic scripting languages, and, frankly, nearly every other language on the market. Almost everything has a runtime nowadays, and while each individually may be fine they don't always play well together. It is important that as C rides into the sunset (optimistic and aspirational, sure, but I hope and believe also true) and C++ becomes an ever more complex choice to make for various reasons that we have a high-power very systems-oriented programming language that will make that choice, because someone needs to.


> It is important that [...] we have a high-power very systems-oriented programming language that will make that choice, because someone needs to.

This is exactly why I'm looking into Zig more.


That would be a good step forward, I support it :)

BTW, do we need the `parallel` keyword, or better to simply let all code be parallel by default?


Haskell has entered the chat…

However, almost all of the most popular programming languages are imperative. I assume most programmers prefer to think of our programs as a series of steps which execute in sequence.

Mind you, arguably excel is the most popular programming language in use today, and it has exactly this execution model.


You do not need to spawn threads/tasks eagerly. You can do it lazily on work-stealing. See cilk++.


Doesn't rayon have a syntax like that?


> OS threads are expensive due to statically allocated stack, and we don't want that. We want cheap threads, that can be run in millions on a single CPU. But without the clumsy "async/await" words.

Green threads ("cheap threads") are still expensive if you end up spreading a lot of per-client state on the stack. That's because with async/await and CPS you end up compressing the per-client state into a per-client data structure, and you end up having very few function call activation frames on the stack, all of which unwind before blocking in the executor.


> if async/await is so good, why not make all functions in the language implicitly async, and instead of "await" just write normal function calls?

You mean like Haskell?

The answer is that you need an incredibly good compiler to make this behave adequately, and even then, every once in a while you'll get the wrong behavior and need to rewrite your code in a weird way.


It's interesting to see an almost marketing-like campaign to save face for async/await. It is very clear from my experience that it was not only a technical mistake, it also cost the community dearly. Instead of focusing on language features that are actually useful, the Rust effort has been sidetracked by this mess. I'm still very hopeful for the language though, and it is the best thing we've got at the moment. I'm just worried that this whole fight will drag on forever. P.S the AsyncWrite/AsyncRead example looks reasonable, but in fact you can do the same thing with threads/fds as long as you restrict yourself to *nix.


I've used async in firmware before. It was a lifesaver. The generalizations you make are unfounded and are clearly biased toward a certain workload.


I personally agree that it is great that Rust as a language is able to function in an embedded environment. Someone needs to grasp that nettle. I started by writing "concede" in my first sentence there but it's not a concession. It's great, I celebrate it, and Rust is a fairly good choice for at least programmers working in that space. (Whether it's good for electrical engineers is another question, but that's a debate for another day.)

However, the entire Rust library ecosystem shouldn't bend itself around what is ultimately still a niche use case. Embedded uses are still a small fraction of Rust programs and that is unlikely to change anytime soon. I am confident the vast bulk of Rust programs and programmers, even including some people vigorously defending async/await, would find they are actually happier and more productive with threads, and that they would be completely incapable of finding any real, perceptible performance difference. Such exceptions as there may be, which I not only "don't deny" exist but insist exist, are welcome to pay the price of async/await and I celebrate that they have that choice.

But as it stands now, async/await may be the biggest current premature optimization in common use. Though "I am writing a web site that will experience upwards of several dozen hits per minute, should I use the web framework that benches at 1,000,000 reqs/sec or 2,000,000 reqs/sec?" is stiff competition.


> But as it stands now, async/await may be the biggest current premature optimization in common use.

To be fair, isn't the entire point of OP's essay that async/await is useful specifically for reasons that aren't performance? Rather, it is that async/await is arguably more expressive, composable, and correct than threads for certain workloads.

And I have to say I agree with OP here, given what I've experienced in codebases at work: doing what we currently do with async instead with threads would result in not-insubstantial pain.


I disagree with the original essay comprehensively. async/await is less composable (threads automatically compose by their nature; it is so easy it is almost invisible), a tie on expressive (both thread advocates and async advocates play the same game here where they'll use a nice high-level abstraction on "their" side and compare it to raw use of the primitives on the other, both are perfectly capable of high-level libraries making the various use cases easy), and I would say async being more "correct" is generally not a claim that makes much sense to me either way. The correctness/incorrectness comes from things other than the runtime model.

Basically, async/await for historical reasons grew a lot of propaganda around how they "solved" problems with threads. But that is an accidental later post-hoc rationalization of their utility, which I consider for the most part just wrong. The real reason async-await took off is that it was the only solution for certain runtimes that couldn't handle threading. That's fine on its own terms, but is probably the definitive example of the dangers of taking a cost/benefit balance from one language and applying it to other languages without updating the costs and the benefits (gotta do both!). If I already have threads, the solution to their problems is to use the actor model, never take more than one mutex at a time, share memory by communicating instead of communicating by sharing memory, and on the higher effort end, Rust's lifetime annotations or immutable data. I would never have dreamed of trying to solve the problems with async/await.


Honestly that's a fair reply, thanks for the additional nuance to your argument. Upvoted.


I'd love the detail on this, what did it save you from and how did you ensure your firmware does not, say, hang?


Having to implement my own scheduler for otherwise synchronous network, OLED, serial and USB drivers on the same device, as well as getting automatic power state management when the executor ran out of arrived promises.

And a watchdog timer, like always. There's no amount of careful code that absolves you from using a watchdog timer.

For anyone curious, Embassy is the runtime/framework I used. Really well built.


That sounds kind of amazing. Working low level without an OS sounds like exactly the kind of place that Rust's concurrency primitives and tight checking would really be handy. Doing it in straight up C is complicated, and becomes increasingly so with every asynchronous device you have to deal with. Add another developer or two into the mix, and it can turn into a buggy mess rather quickly.

Unless you pull in an embedded OS, one usually ends up with a poor man's scheduler being run out of the main loop. Being able to do that with the Rust compiler looking over your shoulder sounds like it could be a rather massive benefit.


The way to do it in C isn't all that different, is it? You just have explicit state machines for each thing. Yes you have to call thing_process() in the main loop at regular intervals (and probably have each return an am_busy state to determine if you should sleep or not). It's more code but it's easy enough to reason about and probably easier to inspect in a debugger.


Yep, the underlying mechanics have to do the same thing - just swept under another a different rug. I imagine the (potential) advantage as being similar to when we had to do the same thing with JavaScript before promises came along. You would make async calls that would use callbacks for re-entry, and then you would need to pull context out from someplace and run your state machine.

Being able to write chains of asynchronous logic linearly is rather nice, especially if it's complicated. The tradeoff is that your main loop and re-entry code is now sitting behind some async scheduler, and - as you mention - will be more opaque and potentially harder to debug.


You're making w lot of assumptions. Debugging this thing was easy. Try it before you knock it.


thanks. looked that up. for the curious: https://embassy.dev/


> Instead of focusing on language features that are actually useful, the Rust effort has been sidetracked by this mess.

I don't know if you are correct or not (I am not very familiar with Rust) but empirically 9/10 Rust discussions I see nowadays on HN/reddit do revolve around async. It kinda sucks for me because I don't care about async at all, and I am interested in reading stuff about Rust.


> empirically 9/10 Rust discussions I see nowadays on HN/reddit do revolve around async. It kinda sucks for me because I don't care about async at all, and I am interested in reading stuff about Rust.

100% yes. I really feel bad precisely for people in your situation. But, there’s a good reason why you see this async spam on HN:

> (I am not very familiar with Rust)

Once you get past the initial basics (which are amazing, with pattern matching, sum types, traits, RAII, even borrowing can be quite fun), you’ll want to do something “real world”, which typically involves some form of networked IO. That’s when the nightmare begins.

So there’s traditional threaded IO (mentioned in the post), which lets you defer the problem a bit (maybe you can stay on the main thread if eg you’re building a CLI). But every crate that does IO needs to pick a side (async or sync). And so the lib you want to use may require it, which means you need to use async too. There are two ways of doing the same thing - which are API incompatible - meaning if you start with sync and need async later - get ready for a refactor.

Now, you (and the crates you’re using) also have to pick a faction within async, ie which executor to use. I’ve been OOTL a while but I think it’s mostly settled on Tokio these days(?), which probably is for the best. There are even sub-choices about Send-ness (for single-vs-multithreaded executors) and such that also impact API-compatibility.

In either case, a helluva lot of people with simple use-cases absolutely need to worry about async. This is problematic for three main reasons: (1) these choices are front-loaded and hard to reverse and (2) async is more complex to use, debug and understand, and (3) in practice constrains the use of Rust best-practice features like static borrowing.

I don’t think anyone questions the strive for insane performance that asynchronous IO (io_uring, epoll etc) can unlock together with low-allocation runtimes. However, async was supposed to be an ergonomic way to deliver that, ideally without splitting the ecosystem into camps. Otherwise, perhaps just do manual event looping with custom state machines, which doesn’t need any special language features.


Thank you for posting this. Async vs sync is a tough design decision, and the ecosystem of crates in async gets further partitioned by the runtime. Tokio seems to be the leader, but making the async runtime a second-class consideration has just made the refactor risk much higher.

I like async, and think it makes a lot of sense in many use cases, but it also feels like a poisoned pill, where it’s all or nothing.


Maybe it's my lack of experience, but I find it much easier to wrap my head around threads than async/await. Yes, with threads there is more "infrastructure" required, but it's straightforward and easy to reason about (for me). With async/await I really don't fully understand what's going on behind the scenes.

Granted, in my job the needs for concurrency\parallelism tend to be very simple and limited.


> With async/await I really don't fully understand what's going on behind the scenes.

But what makes you think you understand what's going on behind the scenes with threads?

You don't need to understand anything more when using async await than when using threads, and it works almost the same way: calling await blocks the “thread” of execution the exact same way a blocking call to an IO function does. And if you don't await on the promise/future then it behaves as if you spawned a thread (in most languages including JavaScript, in Rust it doesn't nothing unless you explicitly “spawn” the future). Sometimes you need to join that thread (by calling await on the future) and sometimes you don't and you leave it to it's business, exactly like threads!

It puzzles me when developers are afraid of the complexity of async, it's entirely in your head: if every junior JavaScript developers can get used to it in a few days, so can you.


Imo this is because threads is a good abstraction. Not perfect, and quite inflexible, but powerful and simple. I would argue green threads are equally simple, too.

Async is implemented differently in different languages. Eg in JS it’s a decent lightweight sugar over callbacks, ie you can pretty much “manually” lower your code from async => promises => callbacks. It also helps that JS is single threaded.

Doing the same in Rust would require you to implement your own scheduler/runtime, and self-referencing “stackless” state machines. It’s orders of magnitude more complex.


Hard agree. In my view, that's Rust #1 problem in terms of developer experience.


Honestly: just ignore them. Just start using Rust! It's a lovely, useful language. HN/reddit are not representative most of the people out there in the real world writing Rust code that solves problems. I am not saying their concerns are invalid, but there is a tendency on these forums to form a self-reinforcing collective opinion that is overfit to the type of people who like to spend time on these forums. Reality is almost always richer and more complicated.


It's true for general programming focused Reddits, but eh, those will always try to poke fun at languages from the extreme perspective. E.g. they are a problem for people new to a language but not for others that have gone through learning the language extensively.

Remember, there are only two types of languages - languages no one uses and languages people complain about. When was the last time you heard Brainfuck doesn't give you good tools to access files?

On /r/rust Reddit, most talk is about slow compile/build times.


If you think that threads are faster than poll(), I would like to know in what use case that happens, because I have never once encountered this in my life.


It's not a technical mistake, it's a brilliant solution for when you need ultra-low-latency async code. The mistake is pushing it for the vast majority of use-cases where this isn't needed.


The reason it's pushed everywhere is because it only works well if the ecosystem uses it. If the larger rust ecosystem used sync code in most places, async await in rust would be unusable by large swaths of the community.


I think it wouldn't be so painful even as it's pervasive if the ergonomics were far better. Unfortunately, things are still far off on that front. Static dispatch for async functions recently landed in stable, though without a good way to bound the return type. Things like a better pinning API, async drop, interoperability and standardization, dynamic dispatch for async functions, and structured concurrency are open problems, some moving along right now. It'll be a process spanning years.


It's not pushed, it's pulled by the hype. The Rust community got onboard that train and started to async all the things without regards to actual need or consequences.


Do you have any evidence to back up the claim that the async efforts have taken away from other useful async features?

Also, lots of major rust projects depend on async for their design characteristics, not just for the significant performance improvements over the thread-based alternatives. These benefits are easy to see in just about any major IO-bound workload. I think the widespread adoption of async in major crates (by smart people solving real world problems) is a strong indicator that async is a language feature that is "actually useful".

The fight is mostly on hackernews and reddit, and mostly in the form of people who don't need async being upset it exists, because all the crates they use for IO want async now. I understand that it isn't fun when that happens, and there are clearly some real problems with async that they are still solving. It isn't perfect. But it feels like the split over async that is apparent in forum discussions just isn't nearly as wide or dramatic in actual projects.


> It's interesting to see an almost marketing-like campaign to save face for async/await.

It's not, it's just Rust people want to explain choices that led them here, and HN/Reddit crowd is all about async controversy, so more Rust people make blog about it, so more HN/Reddit focus on it. Like any good controversy, it's a self-reinforcing cycle.

> it also cost the community dearly

Citation needed. Having async/await opened a door to new contributors as well, it was a REALLY requested feature, and it made stuff like Embassy possible.

And it made stuff like effects and/or keyword generics a more requested feature.


I have to respectfully disagree. People are allowed to like stuff, it doesn't make it a marketing campaign or a conspiracy. This negativity on async, as if it's just a settled debate that async is a failure and therefor Rust is a failure, feels self-reinforcing.

I've written a fair amount of Rust both with threads and with async and you know what? Async is useful, very often for exactly the reasons the OP mentions, not necessarily for performance. I don't like async for theoretical reasons. I like it because it works. We use async extensively at my job for a lot of lower-level messaging and machine-control code and it works really well. Having well-defined interfaces for Future and Stream that you can pass around is nice. Having a robust mechanism for timers is nice. Cancellation is nice. Tokio is actually really solid (I can't speak to the other executors). I see this "async sucks" meme all over the various programming forums and it feels like a bunch of people going "this thing, that isn't perfect, is therefor trash". Are we not better than this?

This is not to say that async doesn't have issues. Of course it does. I'm not going to enumerate them here, but they exist and we should work to fix them. There are plenty of legitimate criticisms that have been made and can continue to be made about Rust's concurrency story, but async hasn't "cost the community dearly". People who wouldn't use Rust otherwise are using async every single day to solve real problems for real users with concurrent code, which is quite a bit more than can be said for all kinds of other theoretical different implementations Rust could have gone with, but didn't.


The big one not mentioned is cancellation. It's very easy to cancel any future. OTOH cancellation with threads is a messy whack-a-mole problem, and a forced thread abort can't be reliable due to a risk of leaving locks locked.

In Rust's async model, it's possible to add timeouts to all futures externally. You don't need every leaf I/O function to support a timeout option, and you don't need to pass that timeout through the entire call stack.

Combined with use of Drop guards (which is Rust's best practice for managing in-progress state), it makes cancellation of even large and complex operations easy and reliable.


It's not easy to cancel any future. It's easy to *pretend* to cancel any future. E.g. if you cancel (drop) anything that uses spawn_blocking, it will just continue to run in the background without you being aware of it. If you cancel any async fs operation that is implemented in terms of a threadpool, it will also continue to run.

This all can lead to very hard to understand bugs - e.g. "why does my service fail because a file is still in use, while I'm sure nothing uses the file anymore"


Yes, if you have a blocking thread running then you have to use the classic threaded methods for cancelling it, like periodically checking a boolean. This can compose nicely with Futures if they flip the boolean on Drop.

I’ve also used custom executors that can tolerate long-blocking code in async, and then an occasional yield.await can cancel compute-bound code.


If you implemented async futures, you could have also instead implemented cancelable threads. The problem is fairly isomorphic. System calls are hard, but if you make an identical system call in a thread or an async future, then you have exactly the same cancellation problem.


I don't get your distinction. Async/await is just a syntax sugar on top of standard syscalls and design patterns, so of course it's possible to reimplement it without the syntax sugar.

But when you have a standard futures API and a run-time, you don't have to reinvent it yourself, plus you get a standard interface for composing tasks, instead of each project and library handling completion, cancellation, and timeouts in its own way.


I don't follow how threads are hard to cancel.

Set some state (like a flag) that all threads have access to.

In their work loop, they check this flag. If it's false, they return instead and the thread is joined. Done.


So you make an HTTP request to some server and it takes 60 seconds to respond.

Externally you set that flag 1 second into the HTTP request. Your program has to wait for 59 seconds before it finally has a chance at cancelling, even though you added a bunch of boilerplate to supposedly make cancellation possible.


If the server takes 60 seconds to respond, and you need responses on the order of 1 second, I'd say that is the problem - not threads.


Cancellation is not worth worrying over in my experience. If an op is no longer useful, then it is good enough if that information eventually becomes visible to whatever function is invoked on behalf of the op, but don't bother checking for it unless you are about to do something very expensive, like start an RPC.


It's been incredibly important for me in both high-traffic network back-ends, as well as in GUI apps.

When writing complex servers it's very problematic to have requests piling up waiting on something unresponsive (some other API, microservice, borked DNS, database having a bad time, etc.). Sometimes clients can be stuck waiting forever, eventually causing the server to run out of file descriptors or RAM. Everything needs timeouts and circuit breakers.

Poorly implemented cancellation that leaves some work running can create pathological situations that eat all CPU and RAM. If some data takes too long to retrieve, and you time out the request without stopping processing, the client will retry, asking for that huge slow thing again, piling up another and another and another huge task that doesn't get cancelled, making the problem worse with each retry.

Often threading is mixed with callbacks for returning results. The un-cancelled callbacks firing after the other part of the application aborted an operation can cause race conditions, by messing up some state or being misattributed to another operation.


Right, this is compatible with what I said and meant. Timeouts that fire while the op is asleep, waiting on something: good, practical to implement. Cancellations that try to stop an op that's running on the CPU: hard, not useful.


I think a better question is "why choose async/await over fibers?". Yes, I know that Rust had green threads in the pre-1.0 days and it was intentionally removed, but there are different approaches for implementing fiber-based concurrency, including those which do not require a fat runtime built-in into the language.

If I understand the article correctly, it mostly lauds the ability to drop futures at any moment. Yes, you can not do a similar thing with threads for obvious reasons (well, technically, you can, but it's extremely unsafe). But this ability comes at a HUGE cost. Not only you can not use stack-based arrays with completion-based executors like io-uring and execute sub-tasks on different executor threads, but it also introduces certain subtle footguns and reliability issues (e.g. see [0]), which become very unpleasant surprises after writing sync Rust.

My opinion is that cancellation of tasks fundamentally should be cooperative and uncooperative cancellation is more of a misfeature, which is convenient at the surface level, but has deep issues underneath.

Also, praising composability of async/await sounds... strange. Its viral nature makes it anything but composable (with the current version of Rust without a proper effect system). For example, try to use async closure with map methods from std. What about using the standard io::Read/Write traits?

[0]: https://smallcultfollowing.com/babysteps/blog/2022/06/13/asy...


For rust, fibers (as a user-space, cooperative concurrency abstraction) would mandate a lot of design choices, such as whether stacks should be implemented using spaghetti stacks or require some sort of process-level memory mapping library, or even if they were just limited to a fixed size stack.

All three of these approaches would cause issues when interacting with code in another language with a different ABI. It can get really complicated, for example, when C code gets called from one fiber and wants to then resume another.

One of the benefits of async/await is the 'await' keyword itself. The explicit wait-points give you the ability to actually reason about the interactions of a concurrent program.

Yielding fibers are a bit like the 'goto' of the concurrency world - whenever you call a method, you don't know if as a side effect it may cause your processing to pause, and if when it continues the state of the world has changed. The need to be defensive when interfacing with the outside world means fibers tend to be better for tasks which run in isolation and communicate by completion.

Green threads, fibers and coroutines all share the same set of problems here, but really user space cooperative concurrency is just shuffling papers on a desk in terms of solving the hard parts of concurrency. Rust async/await leaves things more explicit, but as a result doesn't hide certain side effects other mechanisms do.


> you don't know if as a side effect it may cause your processing to pause, and if when it continues the state of the world has changed

This may be true in JS (or Haskell), but not in Rust, where you already have multithreading (and unrestricted side-effects), and so other code may always be interleaved. So this argument is irrelevant in languages that offer both async/await and threads.

Furthermore, the argument is weak to begin with because the difference is merely in the default choice. With threads, the default is that interleaving may happen anywhere unless excluded, while with async/await it's the other way around. The threading approach is more composable, maintainable, and safer, because the important property is that of non -interference, which threads state explicitly. Any subroutine specifies where it does not tolerate contention regardless of other subroutine it calls. In the async/await model, adding a yield point to a previously non-yielding subroutine requires examining the assumptions of its callers. Threads' default -- of requiring an explicit statement of the desired property of interference is the better one.


I never fully understood the FFI issue. When calling an not-known coroutine safe FFI function you would switch form the coroutine stack to the original thread stack and back. This need not be more expensive than a couple of instructions on the way in and out.

Interestingly, reenabling frame pointers was in the news recently, which would add a similar amount of overhead to every function call. That was considered a more than acceptable tradeoff.


The main problem with fibers/goroutines and FFI is that one of the benefits of fibers is that each fiber starts with a very small stack (usually just a few kBs) unlike native threads usually starting with a much larger stack (usually expressed in MB). The problem is that the code must be prepared to grow the stack if necessary, which is not compatible with the C FFI. That's one of the reasons why Go's FFI to C, for example, is slower than Rust.


Sure, if you are using split stacks, goroutine code is inherently slower. But for FFI you would switch to the main thread stack that is contiguous, so you won't pay any split stack cost there.


Go abandoned split stacks years ago due to the "hot split problem", and is now using contiguous stacks that are grown when necessary via stack copying. Go switches to the system stack when calling C code. There is some overhead (a few tens of ns) due to that switch, compared to languages like Rust or Zig which don't need to switch the stack.


The issue is that the FFI might use C thread-local storage and end up assuming 1-1 threading between the language and C, but if you're using green threads then that won't be the case.


In my opinion, by default fibers should use "full" stacks, i.e. a reasonable amount of unpopulated memory pages (e.g. 2 MiB) with guard page. Effectively, the same stack which we use for threads. It should eliminate all issues about interfacing with external code. But it obviously has performance implications, especially for very small tasks.

Further, on top of this we can then develop spawning tasks which would use parent's stack. It would require certain language development to allow computing maximum stack usage bound of functions. Obviously, such computation would mean that programmers have to take additional restrictions on their code (such as disallowing recursion, alloca, and calling external functions without attributing stack usage), but compilers already routinely compute stack usage of functions, so for pure Rust code it should be doable.

>It can get really complicated, for example, when C code gets called from one fiber and wants to then resume another.

It's a weird example. How would a C library know about fiber runtime used in Rust?

>Yielding fibers are a bit like the 'goto' of the concurrency world - whenever you call a method, you don't know if as a side effect it may cause your processing to pause, and if when it continues the state of the world has changed.

I find this argument funny. Why don't you have the same issue with preemptive multitasking? We live with exactly this "issue" in the threading world just fine. Even worse, we can not even rely on "critical sections", thread's execution can be preempted at ANY moment.

As for `await` keyword, in almost all cases I find it nothing more than a visual noise. It does not provide any practically useful information for programmer. How often did you wonder when writing threading-based code about whether function does any IO or not?


> It’s a weird example. How would a C library know about the fiber runtime used in Rust.

Well, if you green thread were launched on an arbitrary free OS thread (work stealing), then for example your TLS variables would be very wrong when you resume execution. Does it break all FFI? No. But it can cause issues for some FFI in a way that async/await cannot.

> I find this argument funny. Why don't you have the same issue with preemptive multitasking? We live with exactly this "issue" in the threading world just fine. Even worse, we can not even rely on "critical sections", thread's execution can be preempted at ANY moment.

It’s not about critical sections as much. Since the author referenced go to, I think the point is that it gets harder to reason about control flow within your own code. Whether or not that’s true is debatable since there’s not really any implementation of green threads for Rust. It does seem to work well enough for Go but it has a required dedicated keyword to create that green thread to ease readability.

> As for `await` keyword, in almost all cases I find it nothing more than a visual noise. It does not provide any practically useful information for programmer. How often did you wonder when writing threading-based code about whether function does any IO or not?

Agree to disagree. It provides very clear demarcation of which lines are possible suspension points which is important when trying to figure out where “non interruptible” operations need to be written for things to work as intended.


Obviously you would not use operating system TLS variables when your code does not correspond the operating system threads.

They're just globals, anyway - why are we on Hacker News discussing the best kind of globals? Avoid them and things will go better.


Not sure why you’re starting a totally unrelated debate. If you’re pulling in a library via FFI, you have no control over what that library has done. You’d have to audit the source code to figure out if they’ve done anything that would be incompatible with fibers. And TLS is but one example. You’d have to audit for all kinds of OS thread usage (e.g. if it uses the current thread ID as an index into a hashmap or something). It may not be common, but Go’s experience is that there’s some issue and the external ecosystem isn’t going to bend itself over backwards to support fibers. And that’s assuming that these are solved problems within your own language ecosystem which may not be the case either when you’re supporting multiple paradigms.


You've obviously never worked with fibers if you think these are obvious. These problems are well documented and observed empirically in the field.


> In my opinion, by default fibers should use "full" stacks, i.e. a reasonable amount of unpopulated memory pages (e.g. 2 MiB) with guard page.

That's a disaster. If you're writing a server that needs to serve 10K clients concurrently then that's 20GiB of RAM just for the stacks, plus you'll probably want guard pages, and so MMU games every time you set up or tear down a fiber.

The problem with threads is the stacks, the context switches, the cache pressure, the total memory footprint. A fiber that has all those problems but just doesn't have an OS schedulable entity to back it barely improves the situation relative to threads.

Dealing with slow I/O is a spectrum. On one end you have threads, and on the other end you have continuation passing style. In the middle you have fibers/green threads (closer to threads) and async/await (closer to CPS). If you want to get closer to the middle than threads then you want spaghetti stack green threads.


> by default fibers should use "full" stacks, i.e. a reasonable amount of unpopulated memory pages (e.g. 2 MiB) with guard page

Then wouldn't we loose the main benefit of fibers (small stacks leading to a low memory usage in presence of a very large number of concurrent tasks) compared to native threads (the other main benefit being user-space scheduling)? Or perhaps you're thinking of using fibers configured with a very small stack for highly concurrent tasks (like serving network requests) and delegating tasks requiring C FFI to a pool of fibers with a "full" stack?


>

For rust, fibers (as a user-space, cooperative concurrency abstraction) would mandate a lot of design choices, such as whether stacks should be implemented using spaghetti stacks or require some sort of process-level memory mapping library, or even if they were just limited to a fixed size stack.

> All three of these approaches would cause issues when interacting with code in another language with a different ABI. It can get really complicated, for example, when C code gets called from one fiber and wants to then resume another.

Java (in OpenJDK 21) is doing it. To be fair, Java had no other choice because there is so much sequential code written in Java, but also the Java language and bytecode compilation makes it easy to implement spaghetti stacks transparently. Given those two things it's obviously a good idea to go with green threads. The same might not apply to other languages.

My personal preference is for async/await, but it's true that its ecosystem bifurcating virality is a bit of a problem.


Java also has the benefit that most of the Java ecosystem is in Java. This makes it easy to avoid the ffi problem since you never leave the VM.


How do fibers solve your cancellation problem? Aren't they more or less equivalent?

(I find fiber-based code hard to follow because you're effectively forced to reason operationally. Keeping track of in-progress threads in your head is much harder than keeping track of to-be-completed values, at least for me)


With fibers you send cancellation signal to a task, then on next IO operation (or more generally yield) it will get cancellation error code with ability to get true result of IO operation, if there is any. Note that it does not mean that the task will sleep until the IO operation gets completed, cancellation signal causes any ongoing IO to "complete" immediately if it's possible (e.g. IIRC disk IO can not be cancelled).

It then becomes responsibility of the task to handle this signal. It may either finish immediately (e.g. by bubbling the "cancellation" error), finish some critical section before that and do some cleanup IO, or it may even outright ignore the signal.

With futures you just drop the task's future (i.e. its persistent stack) maybe with some synchronous cleanup and that's it, you don't give the task a chance to say a word in its cancellation. Hypothetical async Drop could help here (though you would have to rely on async drop guards extensively instead of processing "cancellation errors"), but adding it to Rust is far from easy and AFAIK there are certain fundamental issues with it.

With io-uring sending cancellation signals is quite straightforward (though you need to account for different possibilities, such as task being currently executed on a separate executor thread, or its CQE being already in completion queue), but with epoll, unfortunately, it's... less pleasant.


> Hypothetical async Drop could help here (though you would have to rely on async drop guards extensively instead of processing "cancellation errors"), but adding it to Rust is far from easy and AFAIK there are certain fundamental issues with it.

Wouldn't fiber cancellation be equivalent and have equivalent implementation difficulties? You say you just send a signal to the task, but in practice picking up and running the task to trigger its cancellation error handling is going to look the same as running a future's async drop, isn't it?


Firstly, Rust does not have async Drop and it's unlikely to be added in the foreseeable future. Secondly, cancellation signals is a more general technique than async Drop, i.e. you can implement the latter on top of the former, but not the other way around. For example, with async Drop you can not ignore cancellation event (unless you copy code of your whole task into Drop impl). Some may say that it's a good thing, but it's just an obvious example of cancellation signals being more powerful than hypothetical async Drop.

As for implementation difficulties, I don't think so. For async Drop you need to mess with some fundamental parts of the Rust language (since Futures are "just types"), while fiber-based concurrency, in a certain sense, is transparent for compiler and implementation complexity is moved to executors.

If you are asking about how it would look in user code, then, yes, they would be somewhat similar. With cancellation signals you would call something like `let res = task_handle.cancell_join();`, while with async Drop you would use `drop(task_future)`. Note that the former also allows to get result from a cancelled task, another example of greater flexibility.


> Keeping track of in-progress threads in your head is much harder than keeping track of to-be-completed values, at least for me

I think that's true for everybody. Our minds barely handle state for sequential code - the explosion of complexity of multiple state-modifying threads is almost impossible to follow.

There are ways to convert "keeping track of in-progress threads" to "keeping track or to-be-completed" values - in particular, Go uses channels as a communication mechanism which explicitly does the latter, while abstracting away the former.


> There are ways to convert "keeping track of in-progress threads" to "keeping track or to-be-completed" values - in particular, Go uses channels as a communication mechanism which explicitly does the latter, while abstracting away the former.

I find the Go style pretty impossible to follow - you have to keep track of which in-progress threads are waiting on which lines, because what will happen when you send to a given channel depends on what was waiting to receive from that channel, no? The only way of doing this stuff that I've ever found comprehensible is iteratees, where you reify the continuation step as a regular value that runs when you call it explicitly.


Because stackful fibers suck for low-level code. See Gor Nishanov's review for the C++ committee http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2018/p136... (linked from https://devblogs.microsoft.com/oldnewthing/20191011-00/?p=10... ). It even sums things up nicely: DO NOT USE FIBERS!


To be clear, not everybody agrees with Gor and although they still don't have much traction, stackful coroutines are still being proposed.

Most (but admittedly not all) of Gor issues with stackful coroutines are due to current implementations being purely a library feature with no compiler support. Most of the issues (context switch overhead, tread local issues) can be solved with compiler support.

The issue with lack of support in some existing libraries is unfair, neither do async/await and in fact it would be significantly harder to add support for those as they would require a full rewrite. The take away here is not to try to make M:N fully transparent, not avoid it completely.

The issue with stack usage is real though, split stacks is only a panacea and OS support would be required.

edit: for the other side of the coin, also from microsoft, about issues with the async/await model in Midori (an high performance single-address-space OS): https://joeduffyblog.com/2015/11/19/asynchronous-everything/

They ended up implementing the equivalent of stackful coroutines specifically for performance.


> The take away here is not to try to make M:N fully transparent

But the whole point of the 'Virtual processors' or 'Light-weight process' patterns as popularized by Golang and such is to try and make M:N transparent to user code.


I mean fully transparent to pre-existing code. Go had the benefit of having stackful coroutines from day one, but when retrofitting to an existing language, expecting the whole library ecosystem to work out of the box without changes is a very high bar.


That paper has some significant flaws:

In section 2.3.1, "Dangers of N:M model", the "danger" lies in using thread-local storage that was built for OS threads, without modification, for stackful fibers. The bottom line here should have been "don't do that", not that stackful fibers are dangerous. Obviously, any library that interacts with the implementation mechanism for concurrency must be built for the mechanism actually used, not a different one.

In section 2.3.2, "Hazards of 1:N model", again points out the dangers of making such blind assumptions, but its main point is that blocking APIs must be blocking on a fiber level, not on the host thread level -- that is, it even proposes the solution for the problem mentioned, then totally ignores that solution. Java's approach (even though, IIRC, N:M) does exactly that: "rewrite it in Java". This is one place where I hope "rewrite it in Rust" becomes more than a meme in the future and actually becomes a foundation for stackful fibers.

Then there are a bunch of case studies that show that you cannot just sprinkle some stackful fibers over an existing codebase and hope for it to work. Surprise: You can't do that with async/await either. "What color is my function" etc.

I'm still hoping for a complete, clearly presented argument for why Java can do it and Rust cannot. (Just for example: "it needs a GC" could be the heart of such an argument).


> Just for example: "it needs a GC" could be the heart of such an argument

Rust can actually support high-performance concurrent GC, see https://github.com/chc4/samsara for an experimental implementation. But unlike other languages it gives you the option of not using it.


Somewhere there's a blog from the old Sun about how M:N threading was a disaster in the Solaris <10 C library and why it was ripped out in Solaris 10. I'm not sure how to find it anymore.


> I think a better question is "why choose async/await over fibers?

> there are different approaches for implementing fiber-based concurrency, including those which do not require a fat runtime built-in into the language

This keynote below is Ruby so the thread/GVL situation is different than for Rust, but is that the kind of thing you mean?

https://m.youtube.com/watch?v=qKQcUDEo-ZI

I think it makes a good case that async/await is infectious and awkward, and fibers (at least as implemented in Ruby) is quite simply a better paradigm.


> I think it makes a good case that async/await is infectious and awkward, and fibers (at least as implemented in Ruby) is quite simply a better paradigm.

As someone who has done fibers development in Ruby, I disagree.

CRuby has the disadvantage of a global interpreter lock. This means that parallelism can only be achieved via multiple processes. This is not the case in Rust, where have access to do true parallelism in a single process.

Second, this talk is not arguing for use of fibers as much as it is arguing as using fibers to rig up a bespoke green threads-like system for a specific web application server, and advocating for ruby runtime features to make the code burden on them of doing this lighter.

Ruby has a global interpreter lock, so even though it uses native threads only one of them can be executing ruby code at a time. Fibers have native stacks, so they have all the resource requirements of a thread sans context switching - but the limitations from the GIL actually mean you aren't _saving_ context switching by structuring your code to use fibers in typical (non "hello world" web server) usage.


> CRuby has the disadvantage of a global interpreter lock. This means that parallelism can only be achieved via multiple processes.

Even for Ruby code (and not native code running in a Ruby app) this is false since the introduction of Ractors, which are units within a single process that communicate without shared mutable (user) state, as the GVL is only “global” to a Ractor, not a process.

(Also worth noting, given the broader topic here, that Ruby has an available async implementation built on top of fibers which doesn't rely on special syntax and allows code to be blocking or non-blocking depending on context.)


I'm sorry but you seem to have missed the point I'm referring to, even though you have directly quoted it.

None of your paragraphs are relevant to async await vs fiber: async await requires you to put keywords in all sorts of unexpected places, fibers do not.

> CRuby has the disadvantage of a global interpreter lock. This means that parallelism can only be achieved via multiple processes.

I am very well cognisant of this fact, but this bears absolutely no relationship with async await vs fibers: one is sweeping callbacks on an event loop under leaky syntactic sugar, the other is cooperative coroutines; they're all on the same Ruby thread, the GVL simply does not intervene.

> this talk is not arguing for use of fibers as much as it is arguing as using fibers to rig up a bespoke green threads-like system for a specific web application server, and advocating for ruby runtime features to make the code burden on them of doing this lighter.

That sounds like a very cynical view of it. I believe there are arguments made in this talk that are entirely orthogonal to how Falcon benefits from fibers.

> Fibers have native stacks, so they have all the resource requirements of a thread

For starters fibers don't have the resource requirements of a thread because they're not backed by native OS threads, while CRuby creates an OS thread for each Ruby VM thread (until MaNy lands, but even with M:N they'd still be subject to the GVL).

Are you arguing for a stackless design like Stackless Python? Goroutines are stack-based too and the benefit of coroutines (and a M:N design) is very apparent. Anyway async await is stack-based too so I don't see how this is relevant: if you have 1M requests served at the same time truly concurrently you're going to have either 1M event callbacks or 1M fibers; that's 1 million stacks either way.

I seem to gather through what I read as a bitter tone that your experience with fibers was not that good. I appreciate another data point but I can't seem to reconcile it vs async await.

But really, the only reason I brought up threads is because the GVL makes threads less useful in CRuby than in Rust (or Go since I mentioned it).


> I am very well cognisant of this fact, but this bears absolutely no relationship with async await vs fibers

The ruby environment does not have async/await, and has a desire for experimentation with user mode concurrency systems due to the impact of GIL on the utility of native threads. Running under an interpreter with primarily heap-allocated objects and a patchable runtime also means that you have less impact from switching to such a model as an application developer.

There were discussions that this somehow pertained to a wider discussion on async/await vs fibers, specifically in rust which does have proper utilization of threads and does have async/await support. Since rust is also operating based on native code, it would need to have changes such as putting fiber suspension logic in all I/O calls between itself and the underlying OS to support developer-provided concurrency.


I only scrolled the video, but it sounds similar, yes. Though, implementation details would probably vary significantly, since Rust is a lower-level language.


> My opinion is that cancellation of tasks fundamentally should be cooperative and uncooperative cancellation is more of a misfeature, which is convenient at the surface level, but has deep issues underneath.

Cooperative cancellation can be pretty annoying with mathematical problems. You can have optimization algorithms calling rootfinding problems calling ODE integrators and any of those can spin for a very long time and you need be threading through cancellation tokens everywhere and the numerical framework generally don't support it. You can and should use iteration counts in all the algorithms but once you're dealing with nested algorithms that can only guarantee that your problem will stop sometime this year, not that it stops within 5 seconds. With these problems I can promise that I'm just doing: lots of math, allocations with their associated page faults, no I/O, writing strings to a standard library Queue object for logging that are handled back in the Main thread that I'm never going to cancel (and whatever other features you think I might need -- which I haven't needed for years now -- I'd be happy to ship that information back to the main thread on a Queue). It feels like that problem should be solvable in the 21st century to me without making me thread cancellation tokens everywhere and defensively code against spinning without checking the token (where I can make mistakes and cause bugs, which I guess you'll just blame me for).


but why do you want cooperative cancellation? this seems ideal for a thread pool, and if the allocated time is up, ask the OS to stop them. ideally the system would allocate the threads to dedicated CPU cores that have all kinds of interrupt handling disabled (so the throughout is maximalized)

if you want to do this without kernel support, then somehow the program needs to do the periodic checks. the JVM has this (safepoints), but there's no royal road for this.


> but why do you want cooperative cancellation?

I don't.

Previous commenter doesn't believe in uncooperative cancellation though and .NET seems to be removing all support for it.

I would love to have worker threads with all kinds of interrupts and I/O disabled on them and just be able to kill them if they wander off into a numerical field somewhere and get stuck in the mud.


The sole advantage of async/await over fibers is the possibility to achieve ultra-low-latency via the compiler converting the async/await into a state machine. This is important for Rust as a systems language, but if you don't need ultra-low-latency then something with a CSP model built on fibers, like Goroutines or the new Java coroutines, is much easier to reason about.


Fibers and async/await are backed by the same OS APIs, they can achieve more or less the same latency. The main advantage of async/await (or to be more precise stackless coroutines) is that they require less memory for task stacks since tasks can reuse executor's stack for non-persistent part of their stack (i.e. stack variables which do not cross yield points). It has very little to do with latency. At most you can argue that executor's stack stays in CPU cache, which reduces amount of cache misses a bit.

Stackless coroutines also make it easier to use parent's stack for its children stacks. But IMO it's only because compilers currently do not have tools to communicate maximum stack usage bound of functions to programming languages.


>Fibers and async/await are backed by the same OS APIs, they can achieve more or less the same latency.

The key requirement for ultra-low-latency software is minimising/eliminating dynamic memory allocation, and stackless coroutines allow avoiding memory allocation. For managed coroutines on the other hand (e.g. goroutines, Java coroutines) as far as I'm aware it's impossible to have an implementation that doesn't do any dynamic memory allocation, or at least there aren't any such implementations in practice.


Yes, it's what I wrote about in the last paragraph. If you can compute maximum stack size of a function, then you can avoid dynamic allocation with fibers as well (you also could provide stack size manually, but it would break horribly if the provided number is wrong). You are right that such implementations do not exist right now, but I think it's technically feasible, as demonstrated by tools such as https://github.com/japaric/cargo-call-stack The main stumbling block here is FFI, historically shared libraries do not have any annotations about stack usage, so functions with bounded stack usage would not be able to use even libc.


> communicate maximum stack usage bound of functions

This would be useful in all sorts of deeply embedded code (as well as more exotic things, such as coding for GPU compute). Unfortunately it turns out to be unfeasible when dealing with true reentrant functions (e.g. any kind of recursion) or any use of FFI, dynamic dispatch etc. etc. So it can only really be accomplished when dealing with near 'leaf' code, where stack usage is expected to be negligible anyway.


> Fibers and async/await are backed by the same OS APIs

async/await doesn't require any OS APIs, or even an OS at all.

You can write async rust that runs on a microcontroller and poll a future directly from an interrupt handler.

And there's a huge advantage to doing so, too: you can write out sequences of operations in a straightforward procedural form, and let the compiler do the work of turning that into a state machine with a minimal state representation, rather than doing that manually.


Sigh... It gets tiring to hear about embedded from async/await advocates as if it's a unique advantage of the model. Fibers and similar mechanisms are used routinely in embedded world as demonstrated by various RTOSes.

Fibers are built on yielding execution to someone else, which is implemented trivially on embedded targets. Arguably, in a certain sense, fibers can be even better suited for embedded since they allow preemption of task by interrupts at any moment with interrupts being processed by another task, while with async/await you have to put event into queue and continue execution of previously executed future.


Exactly. Proof by implementation: asio is a very well known and well regarded C++ event loop library and can be transparently used with old-school hand-written continuations, more modern future/promise, language based async/await coroutines and stackful coroutines (of the boost variety).

The event loop and io libraries are in practice the same for any solution you decide, everything else is just sugar on top and in principle you can mix and match as needed.


> The key requirement for ultra-low-latency software is minimising/eliminating dynamic memory allocation

First, that is only true in languages -- like C++ or Rust -- where dynamic memory allocation (and deallocation) is relatively costly. In a language like Java, the cost of heap allocation is comparable to stack allocation (it's a pointer bump).

Second, in the most common case of writing high throughput servers, the performance comes from Little's law and depends on having a large number of threads/coroutines. That means that all the data required for the concurrent tasks cannot fit in the CPU cache, and so switching in a task incurs a cache-miss, and so cannot be too low-latency.

The only use-cases where avoiding memory allocation could be useful and achieving very low latency is possible are when the number of threads/coroutines is very small, e.g. generators.

The questions, then, are which use-case you pick to guide the design, servers or generators, and what the costs of memory management are in your language.


>Second, in the most common case of writing high throughput servers,

High-throughput servers are not ultra-low-latency software; they prioritise throughput over latency. Ultra-low-latency software is stuff like audio processing, microcontrollers and HFT. There's a trade-off between throughput and latency.


Not here. You don't trade off latency because you cannot reduce it below a cache-miss per context-switch, anyway if your working set is not tiny. The point is that if you have lots of tasks then your latency is has a lower bound (due to hardware limitations) regardless of the design.

In other words, if your server serves some amount of data that is larger than the CPU cache size and can be accessed at random, there is some latency that you have to pay, and so many micro-optimisations are simply ineffective even if you want to get the lowest latency possible. Incurring a cache miss and allocating memory (if your allocation is really fast) and even copying some data around isn't significantly slower than just incurring a cache miss and not doing those other things. They matter only when you don't incur a cache miss, and that happens when you have a very small number of tasks whose data fits in the cache (i.e. a generator use-case and not so much a server use-case).

Put in yet another way, some considerations only matter when the workload doesn't involve many cache misses, but a server workload virtually always incurs a cache-miss when serving a new request, even in servers that care mostly about latency. In general, in servers you're then working in the microsecond range, anyway, and so optimisations that operate at the nanosecond range are not useful.


Could you share an example of a fiber implementation not relying a fat runtime built in the language?


https://github.com/Xudong-Huang/may

The project has some serious restrictions and unsound footguns (e.g. around TLS), but otherwise it's usable enough. There are also a number of C/C++ libraries, but I can not comment on those.


Neat. Thanks for sharing!

Interestingly, may-minihttp is faring very well in the TechEmpower benchmark [1], for whatever those benchmarks are worth. The code is also surprisingly straightforward [2].

[1] https://www.techempower.com/benchmarks/

[2] https://github.com/TechEmpower/FrameworkBenchmarks/blob/mast...


For example https://github.com/creationix/libco

The required thing is mostly just to dump your registers on the stack and jump.


I do think implementations like that are not particularly useful though.

You want a runtime to handle and multiplex blocking calls - otherwise if you perform any blocking calls (mostly I/O) in one fiber, you block everything - so what use are those fibers ?


The answer is the same as in async Rust, right? "Don't do that."

If you wanted to use this for managing a bunch of I/O contexts per OS thread then you would need to bring an event loop and a bunch of functions that hook up your event loop to whatever asynchronous I/O facilities your OS provides. Sort of like in async Rust.


Another discussion where people don't get async/await, can't fathom why you would want a concurrency mechanism on a single thread and assume no one needs it.

UI programming, communication with the GPU, and cross runtime communication are good examples but I'm sure there are more.

Threads, green or otherwise, don't work for those cases but async/await does.


You can easily use threads with GUIs and I've written a bunch of GUI apps in the past that exploited threads quite effectively.


What popular GUI framework is multi threaded?


Depends what you mean by multi-threaded. There are plenty of frameworks that have some support for doing work on background threads.

Take JavaFX for example:

1. Rendering is done on a background thread in parallel with the app thread.

2. The class lib makes it easy to run background tasks that update the GUI to reflect progress and results.

3. You can construct widget trees on background threads and then pass them to the main UI thread only to attach them/make them visible.

With libs like ReactFX you can take event streams from the UI, map over them to a worker thread or pool of threads, and pass the results back to the UI thread.

A few years ago I did a video tutorial showing a few of these things, the code can be seen here:

https://github.com/mikehearn/KotlinFPWebinar/blob/master/src...

Basically this is a simple auto-complete widget. It takes a stream of values from a text field the user is typing in, reduces the frequency of changes (so if you type fast it doesn't constantly do work it's about to throw away) and then passes the heavy work of computing completions onto a background thread, then takes the results back onto the UI thread.

    val edits = EventStreams.valuesOf(edit.textProperty()).forgetful()
    val fewerEdits = edits.successionEnds(Duration.ofMillis(100)).filter { it.isNotBlank() }

    val inBackground = fewerEdits.threadBridgeFromFx(thread)
    val completions = inBackground.map { ngrams.complete(it) }
    val inForeground = completions.threadBridgeToFx(thread)

    val lastResult = inForeground.map {
        FXCollections.observableList(it)
    }.toBinding(FXCollections.emptyObservableList())
    completionsList.itemsProperty().bind(lastResult)
Now if you mean what UI libraries support writes to properties from arbitrary background threads without an explicit thread switch, I think the answer is only Win32 and Motif. But that's not a particularly important feature. JavaFX can catch the case if you access live UI from the wrong thread so such bugs are rarely an issue in practice. In particular it's easy to synchronize with the UI thread with a few simple helpers, e.g.:

    thread(name = "background") {
      while (true) {
        var speed = synchronizeWithUI { speedSlider.value }
        doSomeWork(speed)
      }
    }


I was referring to a UI system that did not differentiate a UI thread and could use things like pre-emptive scheduled tasks to access layout elements, etc.

Your example illustrates my point that you need to manually juggle threads if they are used directly. It ends up being the callback hell that async/await is the nice sugar for.


I don't see the callback hell. There aren't callbacks in the code samples I posted (the Rx-like chain is doing things that go well beyond callbacks).

Every UI toolkit has a concept of a UI thread because there has to be a single target for event notifications to be delivered to (keyboard/mouse/window), and they inherently have to be processed in sequence so they can't just be sprayed across multiple threads. However once you go beyond sequential event processing multi-threading becomes possible and as I said, JavaFX is a good example of that. Events like a window resize cause a reflow but then the actual work of generating the rendering commands is handled by a background thread (and architecturally it could be multiple render threads, that was just never done). Loading new UI can also be done on other threads. I think Blink also does some parallel rendering.

BTW, making property writes automatically thread-safe would be a trivial change to JavaFX but it's not really what you want. You need a concept of UI "transactions" to avoid the user seeing a UI that's half way through being modified by code. That's again not something specific to UI toolkits or threads vs async/await. It's inherent to the task. The "synchronizeWithUI" block in my code sample is doing that: you can think of it as committing some writes to the UI database.


How adequate is Swing (combined with JGoodies Binding or RxJava) compared to JavaFX in this regard?


I don't know JGoodies Binding so I can't help you with that, sorry. The concept is general. The code above uses Kotlin syntax for brevity but of course you can use Kotlin with Swing too.


It's definitely important to be able to manage a bunch of tasks explicitly in a single thread. If there are any other possible implementations of language features that result in the same binary and cause less trouble for the language's users, let's chat about those :)


One of the main benefits of async/await in Rust is that it can work in situations where you don't even have threads or dynamic memory. You can absolutely use it to write very concise code that's waiting on an interrupt on your microcontroller to have read some data coming in over I2C from some buffer. It's a higher level abstraction that allows your code to use concurrency (mostly) without having tons of interactions with the underlying runtime.

Every major piece of software that I have worked on has implemented this in one form or another (even in non-modern C++ where you don't have any coroutine concepts, Apple's grand central dispatch, intel's Thread Building Blocks, etc). If you don't then your business logic will either be very imperformantly block on IO, have a gazillion of threads that make development/debugging a living hell, or be littered with implementation details of the underlying runtime or a combination of all 3.

If you don't use existing abstractions in the language (or through some library), you will end up building them yourselves, which is hard and probably overall inferior to widely used ones (if there are any). I have done so in the past, see https://github.com/goto-opensource/asyncly for C++.


I think the author is confusing two things here:

1. User-space threads / Green threads

2. Structured concurrency

The former one is an advantage of async/await, but is not unique to it (see Go or Java Loom for examples that involves no function coloring problem). And the latter one can be implemented with both OS threads and green threads (see Structured concurrency JEPS for Java [1]).

[1] https://openjdk.org/jeps/462


As one who struggled with thread safe storage since threads started, the amount of code we carry around with us which looks ok, but turns out not to be viable in threads is remarkably high. I bump into this in C, Python3 quite a lot: you have to work harder to do anything which is not synchronous, no matter how you arrive in that place.

For long lived work, it is not impossible there is no advantage overall to forking a lot of heavyweight processes which operate fast as a single execution state. If you have the cores, the CPU and the memory. Context switching delay is very possibly not your main problem.

For example, I have typically 24 1 hour capture/log files, each 300m+ lines big and I need to ungzip them, parse/grep out some stuff, calculate some aggregates and then recombine. Its a natural pipeline of 3 or 4 processes. The effort to code this into a single language and uplift it to run threads inside, where it's basically 24 forked pipe sequences begs questions: What exactly is going to get faster, when I am probably resource limited by the gunzip process at the front?

You think you can code faster than a grep DFA to select inputs? How sure are you that memory structure is faster than awk? Really sure? I tested some. radix tries are nice and have that log(n) length thing, but AWK was as fast. (IP address "hashing" lookup costs)

(hint: more than one interpreted language simply forks gzip for you, to unzip an input stream)

If you can go to C, then it's possible forking heavyweight processes in C, with good IPC buffer selections or mmap is "good enough"


As someone that was big into threading back in the 2000, with the experience gathered through times, I think with modern hardware resources we are better off with OS IPC, as it offers much better safety guarantees, specially in C like languages.


Async/await have one property that is slightly more difficult to achieve with threads, but isn't really discussed much and hasn't been used much to my knowledge: because context is captured fairly explicitly as records, this makes it easy to persist that context to durable storage and/or migrate them to other machines without any additional runtime machinery.

Achieving this with threads is possible but requires additional runtime support. Basically it requires scanning stack frames, which the garbage collector already does, so you're extending the GC to record stack in a separate place.


As well, this functions as state compression by the developer. This is what makes async/await and CPS so much better than thread-per-client: smaller memory footprint!


async/await is syntax. Most of what people associate with it is actually the benefit of virtual threading. Asynchrony is how virtual threading is achieved in user space. Rust has direct native threading by default and can't do virtual threading without a runtime. Async/await makes the use of a runtime explicit. It could be possible to have both virtual and OS threading without special syntax but it would require a marker trait, algebraic effects or monads. Without those things you either need to choose user or OS level threads by default. If user level threads are the default, you need a runtime by default. (Even if that runtime is only for concurrency.). Go made that the default. Again, you don't need async/await to have virtual threading.


Async/await is syntax, but it's not equivalent to virtual threads. It is special syntax for a certain pattern of writing concurrent (and possibly parallel) code: one in which you launch a concurrent operation specifically to get back one result. When you start an OS thread or a virtual thread, that thread can do anything. When you launch a task, it can only do one thing: return one result of the type you asked for.

Async/await is perfect for operations that map well onto this structure. For example, most IO reads and writes fit well into this model - you trigger the IO operation to get back a specific result, then do some other work while it's being prepared, and when you need the result, you block until it's available. Other common concurrent operations don't map well on this at all - for example, if you want to monitor a resource to do something every time its state changes, then this is not well modeled at all as a single operation with a unique result, and a virtual or real thread would be a much better option.

Also, having both virtual and OS threads doesn't need any special syntax. You just need two different functions for creating a thread - StartOSThread(func) and StartVirtualThread(func), and a similar split for other functions that directly interact with threads (join, cancel, etc), and a function for telling whether you are currently running in a virtual thread. Everything else stays the same. This is what Java is doing with Project Loom, I'm not speaking just in principle.

The huge difficulty with virtual threads is implementing all blocking operations (IO, synchronization primitives, waits, etc) such that they use async OS primitives and yield to the virtual thread scheduler, instead of actually blocking the OS thread running the virtual thread.


Aside from the fact that an async/await operation can have arbitrary side effects , returning values is not a prerogative of async/await. For example in c++:

  void foo() {
     auto future = std::async([]{ /* do some stuff */; return some_value; });
     ... // do some work concurrently

     // join the async operation
     auto some_value = future.get();
  }
Here std::async starts a thread (or potentially queues a task in a thread pool); please don't confuse async/await (i.e. stackless coroutines), with more general future/promise and structured concurrency patterns.

edit: note I'm not making any claims that std::async, std::future, std::promise are well designed.


Sure, async/await can have arbitrary side effects, just like functions, and threads can be used to return values - but the design pattern nudges you to a particular end. Basically the async/await version of some code looks like this:

  f = await getAsync();
While the thread-based version looks like this:

  var f;
  t = startThread(getAsync(&f));
  t.join();
 
This is a fundamentally different API. Of course you can implement one with the other, but that doesn't make them the same thing (just like objects and closures can be used to implement each other, but they are different nonetheless).

Of course, there are other concurrency APIs as well, such as the C++ futures you show. Those have other advantages, disadvantages, and workflows for which they fit best. The main difference is that get() on the future blocks the current OS thread if the value isn't available yet, while await doesn't. Thus, futures aren't very well suited to running many concurrent IO operations on the same thread.


You never "have" to await at the call site. There's very little stopping you from:

  fFuture = getAsync();
  // do other things
  f = await fFuture;
Most implementations of async/await Future/Task/Promise often have some sort of `all` combinator to multi-join as soon as all results arrive, even:

  fFuture = getFAsync();
  gFuture = getGAsync();
  hFuture = getHAsync();
  // other stuff
  (f, g, h) = await all(fFuture, gFuture, hFuture);
The syntax makes it really easy to do the simple step-by-step await at every call site, but it also doesn't stop you from writing more complex things. (Sometimes very much more complex things when you get into combinators like `all` and `race`.)


Oh, yes, I didn't think to mention that since I was focused on showing what pattern the syntax makes easiest.


To be 100% clear: the std::async example I gave uses threads. Don't confuse a specific OO API with the general concept. Even vintage pthreads allow returning values on thread join.

And of course you would run multiple concurrent operations on separate threads.

edit: and of course futures are completely orthogonal to threads vs async/await. You can use them with either.


You don't technically need two ways to start threads. That's how Java does it, and there's some technical reason for it that I always forget. There are edge cases where virtual and physical threads aren't completely interchangeable.


You need to know if you're opening an OS thread or a virtual thread if you intend to interact with the OS natively. For example, if you want to call a C library that expects to control and block its thread, you need to ensure you are running on a dedicated OS thread, otherwise you might block all available threads.


I feel like the last difficulty should potentially be easier with the emergence of IO uring support for a wider variety of system calls. Would you agree?


But at the same time, the syntax itself is what matters to most of its users. OS thread can be fast enough for most workloads, but having the syntax showing up where the IO is happening, easily run concurrently what can be, and cancel stuff after a timeout, is the superpower of async/await.

Discussing about implantation performance is missing the forest for the tree. It's like comparing exceptions to error enums on that prism alone: sure there are some implementation where this or that is more efficient than the other, but the ability to express intent in the code is what make those killer features.


> Why choose async/await over threads?

I assume the intended question is: why use async/await over thread-per-client designs?

Easy: because async/await allows you to more easily compress the memory footprint of client/request/work state because there is no large stack to allocate to each client/whatever.

One can still use threads (or processes) with async/await so as to use as many CPUs as possible while still benefiting from the state compression mentioned above.

State compression -or, rather, not exploding the state- is critical for performance considering how slow memory is nowadays.

Async/await, manual continuation passing style (CPS) -- these are great techniques for keeping per-client memory footprint way down. Green threads with small stacks that don't require guard pages to grow are less efficient, but still more efficient than threads.

Threads are inefficient because of the need to allocate them a large stack, and to play MMU games to set up those stacks and their guard pages.

Thread-per-client simply does not scale like async/await and CPS.


Thank you for this post, very interesting.

Parallelism and async and multithreading is my hobby.

I think nodejs and browsers made typeless but coloured async easily digested and understood by everyone. Promises for example.

The libuv event loop used by nodejs is powerful for single threaded IO scalability, but I am looking for something that is multicore and scalable.

I like thread-per-core architectures.

The code written to an architecture has a high overhead in terms of mental understandability to new colleagues and barrier to transform it. Refactoring Postgres multiprocess implementation or nginx's worker model, it would be a major effort and undertaking. Firefox did a lot of work with Electrolysis Quantum to make it multiprocess.

I was yesterday and recently trying to implement a lockfree work queue that any thread can pick up work and "steal" work from other threads.

I have a multithreaded runtime I am building in C which is a "phaser", it does synchronization in bulk. So rather than mutexes, you have mailboxes which are queues.


"Obviously, the embryonic web tried to solve this problem. The original solution was to introduce threading"

Are we calling 1970's mainframes "the embryonic web" now ?


That sentence is super confusing. The web has never supported multi-threading.


Well, apparently they did consist of a lot of tangled threads...



Spiders would be proud of the people who did this. They definitely deserve to be called “web”.


My possibly unpopular opinion is that async/await is a mistake as a first class programming construct. That functionality should 100% be in libraries for two reasons: 1) that way it won't infect the actual language in any way and 2) it will be more difficult to use so people will only reach for it if they really need it.

Sync code and threads is the way to go for 99% of the cases where concurrency is needed. Rust handles most of the footguns of that combination anyway.


Couldn't disagree more. In my experience, single-threaded event-loop driven async/await should be used for every possible concurrency need, with the complexity of multi-threaded concurrency being reserved for the rare cases it's needed. As auto-scaled services and FaaS began to become popular, I've found most any need for multithreaded programming almost wholly unnecessary.


Not everything is a web backend... CPU bound workloads may be rare in your domain, but I would be careful with generalizations.


When every problem looks like a nail, the only thing you need is an hammer!


Not only that but autoscaling with FAAS is an expensive, locked in and complex to administer way to provide functionality.


It's true though that in most cases single-threaded event-loop concurrency is sufficient. I wasn't attempting to make a generalization, I was saying that in cases where the CPU-bound work _is_ more voluminous, I prefer process-parallelism over threaded parallelism for such workloads (of which FaaS is just one possible impl).


> It's true though that in most cases single-threaded event-loop concurrency is sufficient.

This still sounds very web-centric.

In general, concurrency and parallelism are really orthogonal concepts. When threads are used for concurrency, e.g. to handle multiple I/O bound tasks, they do indeed compete with coroutines (async). But when we want parallelism, coroutines are not even an option. That's why it irks me when people compare threads with coroutines without specifying the exact use case.

> I prefer process-parallelism over threaded parallelism for such workloads

Process-parallelism is just a special case of multithreading and in many domains its not even a realistic option.


But coroutines _are_ an option for parallelism, and an especially effective one in the I/O-bound world. The parallelism comes from the OS handling I/O scheduling for the coroutines instead of the application code.

The important difference between multithreading and multiprocess is that I can ignore synchronization that's not IPC in multiprocess models which makes the code much much simpler to implement and reason about.

I wouldn't even say this is web-centric, process-parallelism is a pretty common method of task dispatch in HPC compute topologies that has filtered down to smaller scales of multi-server compute clusters a la kubernetes. In these cases, taking a process-centric message-passing approach can greatly simplify not only the code but the architectural aspects of scheduling and scaling that are quite a bit more difficult with multi-threaded processes or even those that mix I/O and CPU-bound work in the same process (which is often a cause of thread starvation issue in node apps).


> But coroutines _are_ an option for parallelism, and an especially effective one in the I/O-bound world.

Parallelism means that code executes simultaneously (on different CPUs). Coroutines (on a single threaded executor) don't do that.

> The parallelism comes from the OS handling I/O scheduling for the coroutines instead of the application code.

Where exactly is the parallelism that is enabled by coroutines? Coroutines are nothing more than resumable functions.

When it comes to I/O performance, threads vs. async is really a false dichotomy. What we should be comparing is

1. threads + blocking I/O

2. non-blocking I/O (select, poll, epoll)

3. asynchronous I/O (io_ring on Linux, IOCP on Windows)

Coroutines may only provide syntactic sugar for 2. and 3.


> The important difference between multithreading and multiprocess is that I can ignore synchronization that's not IPC

Sure, if the data allows it and you can afford the memory and time overhead. There is no silver bullet! There are cases where you must work on in-process shared memory and process-parallelism wouldn't even be possible (think of multithreading in video games or DAWs). Note that if the data is properly partitioned, you might not need any locking, even with "normal" threads.


The problem is that in Rust the main way of doing async/await (Tokio) is multithreaded by default. So you get all the problems of multithreading, and additionally all the problems with async/await. The discourse would be very different if the default choice of executor would have been thread-local.


Is this default really that big of a problem? Most people running Tokio are going to be running it on multicore hardware, and they'll get some additional throughput for free. If they're sufficiently resource constrained to the degree that they realize that a single thread is a better fit for their domain, they'll Google it and realize you can use Tokio's single-threaded runtime.

To be clear: it is a one-line change to make Tokio use the current (single) thread.

https://docs.rs/tokio-macros/latest/tokio_macros/attr.main.h...


The problem is most clearly visible in the difference between

https://docs.rs/tokio/1.36.0/tokio/runtime/struct.Runtime.ht...

https://docs.rs/async-executor/latest/async_executor/struct....

Those Send + 'static bounds have really big impact. Multithread is definitely not just "additional throughput for free", nor is the change between single to multi just one line of code.


Yes, this is a big problem in my mind and something I mentioned in another comment. Not that multi-threaded async/await is wrong, but the conflation of two methods of concurrency causes a lot of problems you don't see in single-threaded event-loops archs.

In fact it makes me feel like the other thread talking about how async/await in rust was mainly forced on the community to capture js devs seems a bit unlikely as the only thing similar between the two are the spelling of the keywords.


Sounds like a problem with Rust. Not a problem with async/await.


I'm actually more of a fan of the actor model than the single-threaded event-loop. Individual blobs of guaranteed synchronicity with message passing.


Perhaps an acceptable solution would be that all blocking things be async by convention, at least potentially. It's the conflict between sync and async that I find distasteful. The coloring problem.

Zig does almost something like this.


> A common refrain is that threads can do everything that async/await can, but simpler.

> OS threads don’t require any changes to the programming model, which makes it very easy to express concurrency.

I think these claims need a bit of justification, or else why write the article?


Rust async/await is as bad as C++ coroutines. They lack one of the most important language counterparts: async drop/destructor.


That's the main gripe imo: Missing features that are not optional for certain problems where the solution would otherwise make perfect sense.

I get the pragmatism to "better ship something that is 80% finished now, than wait for it to be 100% finished in some years", but with Rust's async/await it was released in 2018 and the more time passes, the more it looks like some sharp edges are here to stay.


I am suspicious that Async, slowly taking over the library ecosystems, will be what drives me away from Rust. Over the past year, it's taken over embedded. I was previously worried about excessive use of generics, but this was, gradually over the past ~2 years replaced with Async.

The std-lib HTTP ecosystem has already gone almost full-async. I was able to spin up a simple web server without it using the rouille library recently. I chose this one because it was one of two options that wasn't Async.

I think I may be the only person who A: Enjoys rust for embedded targets (Eg Cortex-M), but doesn't enjoy Async. The easy path is to go with the flow and embrace Async in those domains (Embedded, web backends). I say: No. Threads, interrupts, DMA, etc are easier to write and maintain, and critically, don't split code into two flavors. You don't just add an Async library to your program: It attempts to take your program over. That is the dealbreaker.


> The easy path is to go with the flow and embrace Async in those domains (Embedded, web backends). I say: No.

Why are you making a principle of not using async though? It's just one particular feature of the language, there's nothing wrong in not being particularly enthusiast about it, but entirely refusing to use it sounds really weird to me.

Every JavaScript junior use async right now, it's not as it is was a particularly difficult programming paradigm to learn. Sure async in rust is a bit more complicated than in JS due to the fact it's lower level, but it's not worse than anything else in Rust!

> Threads, interrupts, DMA, etc are easier to write and maintain,

Not really. I've no doubt that you master them better, but there's no reason to think you won't be able to be as proficient with async. And in fact, async is just strictly more powerful than sync code: you can do exactly what you're doing right now (just write await), plus you get a few features for free (concurrency, cancelation). There's a reason why it's popular actually!

> and critically, don't split code into two flavors

In practice, there isn't either. No if you're writing an application, it's only when you're writing a library and you want to support explicitly the use case of people like you that have moral issues with async.


It can be frustrating to: People who target wasm People who target embedded

And superfluous to: People who have compute bound problems

People with one incoming and one outgoing connection at all times.


> It can be frustrating to: People who target wasm

Having worked with wasm, I couldn't disagree more. Maybe it's been fixed now and we have proper threading support in wasm, but for a long time every thread related function of wasm-unknown-unknown panicked because the function was unimplemented whereas async code just worked.

> People who target embedded

I don't have any embedded experience in Rust so I don't have an opinion, but at the same time I fail to see how it can be frustrating in that space: if you're targeting no-std you cannot use a dependency that isn't no-std itself so you're unlikely to be bothered by non-embeded use of async. Now if async take over embedded, that's different but at the same time it means it's probably not “frustrating” to the majority of people but in fact helpful.

> And superfluous to:

This is a very poor metric, I've never used once any if the math primitives in the std lib in 9 years of Rust, almost never Unicode parts of the std, barely used atomic and never inline assembly, that doesn't mean these features do not belong to Rust and I have no issues with them being there even though they are superfluous to me. The key difference is that I don't have any ideological detestation of them…


A bit late to reply, but if you see this know that you made some good points. Detesting it is probably a bit much, but I do find the pattern cumbersome. Do you use tokio in wasm? I haven't, but I've seen issues for wasm support closed because of tokio. That could have been an excuse. Again in embedded what I run into is that embassy is the standard executor, and is useful, but so many crates bring in tokio when they don't really need to that they are poisoned for embedded use. Lastly it isn't just being superfluous, it is a dozen or more crates bringing download size and attack surface that don't solve any problem I ever actually have, or mostly any problem the crate author had. So really what you helped me realize is that it isn't async/await I dislike at all, it is just tokio being the defacto implementation of it.


I don't think tokio itself is going to be the issue for wasm or embedded, but if the crate you're importing uses tokio it usually means it's expecting to have access to an OS with file and network access, none of which is going to work in wasm or embedded in the first place.

But I agree with you on the fact that Rust ecosystem (the part of the ecosystem that uses std at least) is tied to tokio is not a good situation. In theory we could have crates being agnostic about the executor, as they aren't supposed to execute anything, just expose the futures and the top level crate is the one driving the execution, but AFAIK the way tokio is designed currently makes it impossible to separate the concerns. The Rust team would like to change that, but it's not trivial and it would require collaboration from the tokio team, which is not granted.


Just tried changing my python code to use AA instead of the 2 threads I use and it just complicated the code more. I had to create a thread anyway to deal with the tkinter event loop issue. And then I still have to check inside the functions if the task is still active. But the worst part is converting all the stack of functions to to be async/await.


I think its partly down to what you grew up with. If you're of the JS (or python 3) generation then you're _probably_ more comfortable with async.

However there are things where async seems to be more of a semantic fit for what youre doing.

for example FastAPI is all async, and it makes sense. I started using it, because it scaled better than anything else. They have don't a nice job of making the interface as painless as possible. It almost doesn't feel like surprise goto.

I do a lot of stream processing, so for me threads is a better fit. It scales well enough, and should I need to either go to multiprocessing (not great) or duplicate to a new stand-alone process, its fairly simple (Keeping everything message based also helps.)

async/threads is _almost_ like shop bought coke vs sodastream. They are mostly the same, but have slightly different semantics.


I worked with a fairly large code base which leveraged threads to avoid having to use callbacks in a C++ code base. This allowed the engineers to use the more familiar linear programming style to make blocking network calls which would stall one thread, while others were unblocked to proceed as responses were received. The threads would interoperate with each other using a similar blocking rpc system.

But what ended up happening was that the system would tend to block unnecessarily across all threads, particularly in cross thread communication. These were due to many external network calls being performed sequentially when they could have been performed in parallel, and likewise cross-thread communication constantly blocking due to sequential calls. The end result was a system who'd performance was just fundamentally hobbled by threads that were constantly waiting on other threads or series of external calls, and it was very difficult to understand the nature of the gordian knot of blocking behavior that was causing these issues.

The main problem with using threads is that the moment you introduce a different cpu into the mix you need to deal with synchronization primitives around your state, and this means that engineers will use fewer threads (at least in our case) than necessary to reduce the complexity of the synchronization work needed to be done which means that you lose the advantage of asynchronous parallelism. Or at least that is what happened in this particular case.

The cost of engineering synchronization for async/await is zero, because this parallelism happens on a single thread. Since the cpu "work" to be done for async/io is relatively small, this argues for using single threaded 'callback' style solutions where you maximize the amount of parallelism and decrease the amount of potential blocking as well as minimizing the complexity of thread synchronization as much as possible. In cases where you want to leverage as many cpu's as possible, it's often the case that you can better benefit from cpu parallelism by simply forking your process on multiple cores.


If you are doing anything complex you really want to use a polling loop. That’s how video games, avionics systems, industrial equipment, etc is programmed


That's what async/await abstracts internally.


> "Smart programmers try to avoid complexity. So, they see the extra complexity in async/await and question why it is needed. This question is especially pertinent when considering that a reasonable alternative exists in OS threads."

Hm, yes, but is that complexity really avoided if it's in the language/runtime?

Sure, it's not in your code and it will probably have way more extensive testing and maybe also people that thought about the problem for months and you're just trying to implement Business feature #31514 and not reinventing the wheel, but as someone who has been bitten by specific implementations like the one in Quarkus (not a language, granted), I must say:

Complexity is there to stay (and occasionally bite you), even if it's hidden!


I tend to agree, but an important note is that I've never seen an async/await system that didn't ALSO interact with OS threads. Async await is not an alternative to OS threads but rather an additional layer.


I have heard about them in embedded systems where there are no OS threads, because there is no OS, but there is Async tasks and a scheduler


That just means there's an OS but it's one a bit like Win 3.1, with fully cooperative multitasking.


You mean like the async/await systems in the two most popular languages in the world - JavaScript and Python?

Async/await is definitely an alternative to user space threads, especially for IO bound tasks.


Some of the Python schedulers today already use pools of OS Threads. There's definitely complications involving the GIL that makes Python ugly in true multi-threading, but Python has done a lot to mitigate it in the last few years and the rumors are GIL removal will happen sooner than a lot of people expect.


> Hm, yes, but is that complexity really avoided if it's in the language/runtime?

Complexity can be abstracted away and this is still a worthwhile goal. Otherwise, we'd all still be studying Intel CPU documentation.


It can and it should, but I disagree with the sentiment that when it's abstracted "away" it's not there anymore.

You just don't see it.

Abstraction doesn't change any fact about how the system actually works.


To me, implementing an async api using algebraic effects is the end game. Gets rid of the need of an async keyword and no monads required!


Async/await makes more sense in some languages than others.

For most PLs, algebraic effects look ideal. Just seems like we're figuring out how to talk about them to users and what the UX should be.


> Gets rid of the need of an async keyword

You still need an async effect handler though, which is effectively the same syntax wise.


Because you want to avoid context switching cost. In fact, async/await/promise/future/Task is also closely related to fiber as opposed to thread at least in Windows, but fiber is provided from an OS level while async/promise/future/Task is provided by the language itself. You can switch to a different fiber without doing a context switch, just like how you use a state machine to switch to different job using async/promise/future/Task


Context switching cost is mostly artificial. You save kernel/user transitions, but you can also have user-mode threads without transitions. What else are you saving by using coroutines instead of threads? You're not saving register swapping or cache warmup. You are saving stack allocations but those don't cost time.


What is the language agnostic answer to the same question?

I imagine something to do with memory usage or avoiding thread or thread pool starvation issues. Maybe performance too?


I don't have a lot of experience with async/await at high numbers of tasks, but I've run Erlang with millions of processes. It's a lot easier to run millions of Erlang processes on one machine than to run a million OS threads. I suspect async tasks would be similar; an OS thread needs its own stack, and that's going to use at least a page of memory, but often much more. Otoh, an async task or green thread might be able to use less.

If you're running real OS threads, I think task switching is going to be real context switches, which might mean spectre mitigations clear your cpu caches, but task switching can avoid that.

You may end up with more system calls with OS threads, because your runtime might be able to aggregate things a bit (blocking reads become kqueue/epoll/select, but maybe that's actually a wash, because you do still need a read call when the FD is ready, and real blocking only makes a single call)


IMO the biggest reason to avoid threads is simply that it's ~impossible to write safe code using threads (e.g. without race conditions). Arguably with Rust's ownership system that's less true there than in other languages.


Asynchronous code has race conditions and synchronization issues too.

I pray for all the code written by people who think they didn’t need to learn about synchronization because they wrote asynchronous code.

And unfortunately I’ve come across and had to fix asynchronous code with race conditions.

You cannot escape learning about synchronization. Writing race-condition-free code is not hard.

What is actually hard is writing fast lock-free routines, but that’s more a parallelism problem that affects both threaded and asynchronous code. And most people will never need to reach that level of code optimization for their work.


Async-await is about concurrency, not parallelism. It can work in both a single-threaded and multi-threaded context, the latter exposing all the typical failure modes of multi-threaded code.

Also, Rust’s ownership model only prevents data races, that’s only the tip of the iceberg of race conditions, and I don’t think that any general model makes it possible to statically determine that any given multithreaded code is safe. Nonetheless, that’s the only way to speed up most kind of code, so possibly the benefits outweigh the cost in many cases.


You can write safe code using threads if you enforce that the only way threads can communicate is by sending messages to each other (via copying, not pointers). This is what Erlang does.


No, you can get races and deadlocks in a pure actor system as well. It's actually easier in my experience to end up with problems in actors. I tried writing an app in that style once and had to back off that design and introduce some traditional shared memory multi-threading on some codepaths.

There are no shortcuts, no silver bullets when it comes to concurrency. Programmers have to learn about multi-threading and the ways it can go wrong because it's a fundamental thing, not a mere feature you can design your way out of.


> There are no shortcuts, no silver bullets when it comes to concurrency.

But you can identify the opposite of the silver bullet, which is shared mutable state, and then hurry away from it as quickly as possibly.

Rust defaulted to avoiding sharing and Haskell defaulted to avoiding mutability.

For application code, I've yet to see a better concurrency story than 'use pure functions where possible, or use STM when you absolutely need shared mutation'.


>No, you can get races and deadlocks in a pure actor system as well.

You can't get data races, which is what Rust prevents. Rust's async doesn't prevent deadlocks or other kinds of races.


> You can write safe code using threads if you enforce that the only way threads can communicate is by sending messages to each other (via copying, not pointers).

At which point they're not really threads, they're more, well, processes.

> This is what Erlang does.

Exactly.


> IMO the biggest reason to avoid threads is simply that it's ~impossible to write safe code using threads (e.g. without race conditions).

Javascript has race conditions too, even with no threads involved.


> I recall that the thing that made me stick with Rust is the Iterator trait. It blew my mind that you could make something an Iterator, apply a handful of different combinators, then pass the resulting Iterator into any function that took an Iterator.

What languages did the author use before? Is this any different from the interface pattern seen in many other languages?


Days before I'd prefer manipulating between threads manually, because I used to do it for years.

But as it goes much mature the async-await especially with stackless coroutine implementation, writing in async-await is more pleasure than handling threads.

Handmade futures can have some pitfalls, like this blog wrote: https://blog.waynest.com/2022/12/async-cancellation/

However, I don't like Rust's trend of making global, implicit async runtime, especially with #[xxx:main]. Runtime should be specified clearly, like Netty channel must explicitly bind to an EventLoopGroup. And Java's CompletableFuture without specifying a concrete executor causes many issues already.


I have to say, I am not convinced by this article that async composes better. The nice thing about green threads/fibers is you can make concurrency an internal detail: a function might spawn threads internally, or block, but the caller is free to use it as any other normal function, including passing it to a map() or filter() combinator.

By contrast, async forces the caller to acknowledge that it's not a regular function, and async functions don't compose at all with normal code. You have to write async-only versions of map() filter() and any other combinators.

Maybe async composes better with other async, but with threads, you can just compose with any other existing code.


Threads are not useful for I/O bound code. Anyway the requests are getting serialized into a single queue within the network driver or a disk driver. Actually, they make things worse by 1) issuing multiple out-of-order request to the disk, and 2) wasting time and memory switching that thread context back and forth. For CPU-bound tasks, it makes sense to create multiple threads, but only up to amount of actual cores available, otherwise it makes things worse again by cache thrashing. Thread pool(s) + message queues is what worked best for me.


Any semi-modern card or ssd provides multiqueue support in hardware.

And even with a single queue, concurrency can help keeping a queue full.


Imo, async/await is useful in that it's a simpler approach - developers don't need to worry about threading and all its inherent problems which for the large majority of web and semi-web related tasks is completely fine; the same reason we use languages with GC; no point managing your own memory for most tasks.

Someone writing a driver or something is obviously going to use proper threading and deal with all of the intricacies and gotchas.


As an observer who has used async/await in other languages (Javascript, Swift) I'm really confused why this is such a contentious topic in Rust. swift-concurrency has had its issues as well, but almost everyone I know finds it way more ergonomic and useful than how things were done before. As someone currently learning Rust, is there something particular about the language compared to how Swift did it? Outside of HN I would not know this is such a controversial topic.


I always understood async/await as expression of concurrency and threads as expression of parallelism. Which aren't the same thing.

Concurrency breaks computation into chunks that can be interleaved even on a single CPU advancing computation concurrently overall but not necessarily physically at the same time for each chunk.

Parallelism on the other hand breaks computation into chunks that literally can run in parallel on different CPUs.

You can even combine both in some fashion.


ELI5: If you have to modify the language anyways (to add async/await) why not just have green threads that wrap NBIO with seemingly blocking calls?


What i would hope for is implicit async/await so i don't have to write it out / wrap all the time


In Kotlin, only the outer-most wrapping 'suspend' function swap needs to be wrapped; and when you're writing web servers and such, that's handled in the server middleware and so all your code can be `suspend fun blahblah()`


That's eerily similar to what (green) threads are.


I’ve always thought async await is just a less clunky implementation of green threads. Not as a feature parity replacement, but trying to fill the same niche.


Yeah, fully agree with that. To me this is akin to writing line numbers in old BASIC dialects. The language makes the user do something that ought to be automated.


>implicit async/await

I keep suspecting C# will be the place where we see this, but probably not for another couple years.


We might not be that far away already. There is this issue[1] on Github, where Microsoft and the community discuss some significant changes.

There is still a lot of questions unanswered, but initial tests look promising.

Ref: https://github.com/dotnet/runtime/issues/94620


We have it in Zig, Lua, Python, C, etc.


> A common refrain is that threads can do everything that async/await can, but simpler

This is the first time I've heard that, and given how many languages have added async/await after threading support, I don't think it is really a broad consensus.


The article talks a lot about async/await but fails to clearly state the main advantage of async code over threads. Async code in general (not only in Rust) allows a server to process thousands of client connections concurrently with minimal latency in a single thread. Even if each client request needs several seconds to process it (assuming the processing is IO-bound). One thread (or more generally, a small number of threads) is much cheaper resource-wise than thousands of threads (in a thread per client scenario).


Async and Thread are of different abstraction levels[1]. The easier to use, the higher the abstraction level.

Rule of thumb: use the highest level of abstraction for your case.

Of course, the Async implementation can be atrocious or in conflict with something already used by your code, forcing you to use a lower level of abstraction.

[1]https://blog.codinghorror.com/the-wrong-level-of-abstraction...


>> A common refrain is that threads can do everything that async/await can, but simpler.

Who says that? Threads and async/await are different things, and it doesn’t make sense to say one can do what the other does. And threads definitely are not simpler than AA


Threads are easier to spawn and that's where all the fuzz comes from, I argue.

Especially in Rust, Async isn't as easy as in other languages with a runtime and it does indeed have some caveats (e.g. cancellation), but all the real fuzz comes from not understanding that they are different strategies for some similar but not the same problems.

It makes little to no sense to use Async/Await for number crunching/CPU-compute intensive tasks, for example.

One can use Threads for some IO waiting though, but it's definitely not the best solution for that particular problem.

To me this whole discussion has two facets:

1) How can Async/Await be more ergonomic in Rust 2) How can we teach people that Async/Await is a different solution with different tradeoffs to Threads - there is a reason why Async/Await was created AFTER WE ALREADY HAD INVENTED THREADS!


My problem with rust's async/await is that it's _not_ a different strategy as the continuation tasks _are_ run multithreaded, so it's technically both strategies. IMO one of the biggest selling points of single-threaded async/await was how much complexity falls away compared to managing preemptive synchronization in the multi-threaded case.

I can see why there's so much controversy over async/await in rust. If I had to take both the syntax and cognitive hit of using async/await _and_ multi-threading, I would also angrily call for its removal.


Async wasn't invented after threads. It was primarily popularized in a system that has been designed such that it couldn't use threads (the web browser). Everything else is post-justifcation for why it's better. It isn't better.


Async/Await as a syntax thing wasn't, but Async/Await as "don't just blindly use Threads for scaling to a myriad of incoming HTTP requests" was.

I remember the Apache webserver story and that one has little to nothing to do with Webbrowsers or JavaScript ;)


Async/await vs threads is yet another entry in the ~800 volume ongoing series "Where the Indirection Go?" In this case, you put concurrency inside the process (async) or outside the process (threads). Your CPU is, in both cases, constantly rotating between workloads, and that indirection can be inside or outside the process. Once you start caring about either human usability or fractional performance, the distinction matters. Otherwise, it doesn't.


Because asynchronous programming reduces to its equivalent in digital circuit design being contingent on analog circuit engineering decisions.


Sorry, not a native speaker here and I can't parse that sentence. Would you be so kind to dumb it down a little?


is the difference between threads and async/await more than syntax? or language-specific?


Implementation difference. Threads are usually handled by the kernel and each one has its own stack where you can put anything. And the kernel can switch threads at any time. Async/await has the compiler work out precisely what has to be saved across the marked locations which are the only places context switches can happen. Also it doesn't tell the kernel when it switches context.


Threads can be in scheduled in userspace and can also be non-preemptive or have deterministic scheduling.


Await-async is implemented with a runtime which uses a thread pool or a single thread and allocates work when needed on any thread and waits for IO to yield a result.

With threads you just fully control what blocking code is running on a single thread.

If you are just running computations (or reading files, as filesystem api are not async) it's simpler to just use threads.


> or reading files, as filesystem api are not async

This is interesting - all nodejs file system APIs are async by default.


That's because javascript as a whole is async by default. Which really makes me confused as to why javascript didn't just go the route of Go and eliminate the distinction instead of falsely creating it with syntax.


Yes, they are different in most cases.

In general await job is to pass the process to 3rd party e.g. database or http and wait for callback whereas thread job is to launch multiple CPU operations in parallel.


How is async/await implemented under the hood?


Generally via continuations. An async function is transformed to continuation-passing style, await calls with the current continuation, and then you have a runtime that at its simplest is just, like, a queue of tasks and has special-cased primitives for doing things like async I/O where you suspend, and it just pulls tasks off the queue and runs them, and when one task suspends it stores the continuation and runs the next one.


Async/await in Rust is famously not based on continuations, at least not in any traditional sense, where a block of code is passed to a reactor system to be invoked whenever an operation completes.

Instead it is based on "wakers", which are objects associated with a task, and can be used to notify the task's executor that the task is ready to make progress. It is then the job of an executor resume the task. So there is an extra layer of indirection (conceptually).

There are pros and cons, but in essence the system trades a check on resume (often redundant) for the need to make a heap allocation and/or type erasure at each await point.

(It's possible to avoid the latter in continuation-based implementations, like C++ coroutines, but it's pretty hard.)


Does that runtime run the tasks across multiple cores?


In Rust the answer is "it depends". Since the runtime is not provided by the language you can have implementations that are a single thread, thread-per-task, a thread pool, or whatever other setup you can think of. Tokio at least offers a single threaded version and a thread pool version.


poll()


No offense but you don't serve 1 million clients with 1 thread per client.

So what this ends up as is describing the mechanism that Rust uses to hide the state machine you'd do by hand in an older language.


Please. Stop posting about async/await here. It just gives space for everyone and their mother to post their superstitions and misconceptions about what is otherwise an elegant and extremely powerful model.

The Rust implementation is great, so is the C# one. They have their own tradeoffs but I would never choose anything else, and 8 out of 10 developers who do disagree never stopped reading past "me sees await means no thread blocky" instead of focusing on structured concurrency patterns it enables. Hell, C# does not even need any fancy terms for this because it is this low ceremony. Worse languages, however, do require more effort so have to justify it with inventing words.


> Please. Stop posting about async/await here.

I agree with everything else you said, but that's all the more reason to post it imo. The misconceptions aren't going away just because you don't hear about them, and every time I hear the different arguments, my own understanding grows a little bit. It's annoying, but quite healthy.


Iterators blew the writer's mind... Most languages have iterators, come on... Java, C#, Python, C++, only to name a few




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: