Why choose async/await over threads?

Animats · 2024-03-25T07:29:26 1711351766

Async/await with one thread is simple and well-understood. That's the Javascript model. Threads let you get all those CPUs working on the problem, and Rust helps you manage the locking. Plus, you can have threads at different priorities, which may be necessary if you're compute-bound.

Multi-threaded async/await gets ugly. If you have serious compute-bound sections, the model tends to break down, because you're effectively blocking a thread that you share with others.

Compute-bound multi-threaded does not work as well in Rust as it should. Problems include:

- Futex congestion collapse. This tends to be a problem with some storage allocators. Many threads are hitting the same locks. In particular, growing a buffer can get very expensive in allocators where the recopying takes place with the entire storage allocator locked. I've mentioned before that Wine's library allocator, in a .DLL that's emulating a Microsoft library, is badly prone to this problem. Performance drops by two orders of magnitude with all the CPU time going into spinlocks. Microsoft's own implementation does not have this problem.

- Starvation of unfair mutexes. Both the standard Mutex and crossbeam-channel channels are unfair. If you have multiple threads locking a resource, doing something, unlocking the resource, and repeating that cycle, one thread will win repeatedly and the others will get locked out.[1] If you need fair mutexes, there's "parking-lot". But you don't get the poisoning safety on thread panic that the standard mutexes give you.

If you're not I/O bound, this gets much more complicated.

[1] https://users.rust-lang.org/t/mutex-starvation/89080

exfalso · 2024-03-25T07:45:20 1711352720

Yes, 100%.

I've mostly only dealt with IO-bound computations, but the contention issues arise there as well. What's the point of having a million coroutines when the IO throughput is bounded again? How will coroutines save me when I immediately exhaust my size 10 DB connection pool? It won't, it just makes debugging and working around the issues harder and difficult to reason about.

AstralStorm · 2024-03-25T11:06:38 1711364798

The debugging issue is bigger than what it seems.

Use of async/await model in particular ends up with random hanged micro tasks in some random place in code that are very hard to trace back to the cause because they're dispersed potentially anywhere.

Concurrency is also rather undefined, as are priorities most of the time.

This can be partly fixed by labelling, which adds more complexity, but at least is explicit. Then the programmer needs to know what to label... Which they won't do, and Rust has no training wheels to help with concurrency.

Threads, well you have well defined ingress and egress. Priorities are handled by the OS and to some degree fairness is usually ensured.

ninkendo · 2024-03-26T12:46:59 1711457219

> What's the point of having a million coroutines when the IO throughput is bounded again?

Because if you did the same thing with a million threads, you’d have all those same problems and more, because (a) threads take up more RAM, (b) threads are more expensive to spawn and tear down, and (c) threads have more registers to save/restore when context switching than coroutines.

The real answer is that you shouldn’t have a million coroutines, and you shouldn’t have a million threads. Doing such a thing is only useful if you have other useful work you can do while you wait on I/O. This is true for web servers (which want to serve other requests, so… maybe one coroutine per active request) and UI’s (which should have a dedicated thread for the UI event loop), but for other applications there’s shockingly little need to actually do async programming in the first place. Parallelism is a well-understood problem and can be used in key places (ie. doing a lot of math that you can easily split into multiple cores), but concurrency is IMO a net negative for most applications that don’t look like a web server.

temporarely · 2024-03-25T12:12:49 1711368769

Just morning bathroom musings based on your posts (yep /g) and this got me thinking maybe the robust solution (once and for all for all languages) may require a rethink at the hardware level. The CPU bound issue comes down to systemic interrupt/resume I think; if this can be done fairly for n wip thread-of-execution with efficient queued context swaps (say maybe a cpu with n wip contexts) then the problem becomes a resource allocation issue. Your thoughts?

Animats · 2024-03-25T17:48:51 1711388931

> a cpu with N wip contexts

That's what "hyper-threading" is. There's enough duplicated hardware that beyond 2 hyper-threads, it seems to be more effective to add another CPU. If anybody ever built a 4-hyperthread CPU, it didn't become a major product.

It's been tried a few times in the past, back when CPUs were slow relative to memory. There was a National Semiconductor microprocessor where the state of the CPU was stored in main memory, and, by changing one register, control switched to another thread. Going way back, the CDC 6600, which was said to have 10 peripheral processors for I/O, really had only one, with ten copies of the state hardware.

Today, memory is more of a bottleneck that the CPU, so this is not a win.

zozbot234 · 2024-03-25T18:34:00 1711391640

The UltraSPARC T1 had 4-way SMT, and its successors bumped that to 8-way. Modern GPU compute is also highly based on hardware multi-threading as a way of compensating for memory latency, while also having wide execution units that can extract fine-grained parallelism within individual threads.

nine_k · 2024-03-25T18:48:43 1711392523

Also, IBM POWER have SMT at levels above 2; at least POWER 7 had 4-way SMT ("hyperthreading").

Animats · 2024-03-25T22:55:25 1711407325

Missed that. That's part of IBM mainframe technology, where you can have "logical partitions", a cluster on a chip, and assign various resources to each. IBM POWER10 apparently allows up to 8-way hyperthreading if configured that way.

temporarely · 2024-03-25T17:56:53 1711389413

Thanks, very informative.

binary132 · 2024-03-25T13:42:36 1711374156

What you said sounded in my head more like you’re describing a cooperatively scheduled OS rather than a novel hardware architecture.

temporarely · 2024-03-25T14:53:02 1711378382

(This has been a very low priority background thread in my head this morning so cut me some slack on hand waving.)

Historically, the H/W folks addressed (pi) memory related architectural changes, such as when multicore came around and we got level caches. Imagine if we had to deal at software level with memory coherence in different cores [down to the fundamental level of invalidating Lx bytes]. There would be NUMA like libraries and various hacks to make it happen.

Arguably you could say "all that is in principle OS responsibility even memory coherence across cores" and we're done. Or you would agree that "thank God the H/W people took care of this" and ask can they do the same for processing?

The CPU model afaik hasn't changed that much in terms of granularity of execution steps whereas the H/W people could realize that d'oh an execution granularity in conjunction with hot context switching mechanism, could really help the poor unwashed coders in efficiently executing multiple competing sequences of code (which is all they know about at H/W level).

If your CPU's architecture specs n+/-e clock ticks per context iteration, then you compile for that and you design languages around that. CPU bound now becomes heavy CPU usage but is not a disaster for any other process sharing the machine with you. It becomes a matter of provisioning instead of programming ad-hoc provisioning ..

binary132 · 2024-03-25T16:14:22 1711383262

If our implementations are bad because of preemption, then I’m not sure why the natural conclusion isn’t “maybe there should be less preemption” instead of “[even] more of the operating system should be moved into the hardware”.

marcosdumay · 2024-03-25T15:38:52 1711381132

If you have fewer threads ready to run than CPU cores, you never have any good reason to interrupt one of them.

sporkland · 2024-03-25T23:37:23 1711409843

I don't know how the challenges with cooperative (no-preemptive) multitasking keep needing to get rediscovered. Even golang, which I consider a very responsibly designed language, went with cooperative at first until they were forced to switch to preemptive. Not saying cooperative multitasking doesn't have its place, just that it's gotta have a warning sticker or even better disallow certain types of code from executing statically.

Also great time to plug a related post, What color is your function:

https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...

neonsunset · 2024-03-25T10:41:28 1711363288

I keep having to repeat it on this unfortunate website: it's an implementation detail, a multi-threaded executor of async/await can cope with starvation perfectly well as demonstrated by .NET's implementation that can shrug off some really badly written code that interleaves blocking calls with asynchronous.

https://news.ycombinator.com/item?id=39530435

https://news.ycombinator.com/item?id=39786142

https://news.ycombinator.com/item?id=39721626

AstralStorm · 2024-03-25T11:11:07 1711365067

Said badly with code will still execute with poor performance and the mix can be actively hard to spot.

maccard · 2024-03-25T13:02:24 1711371744

Badly written threaded code will have the same problem, unfortunately

neonsunset · 2024-03-25T11:29:52 1711366192

You would be surprised. It ultimately regresses to "thread per request but with extra steps". I remember truly atrocious codebases that were spamming task.Result everywhere and yet there were performing tolerably even back on .NET Framework 4.6.1. The performance has (and often literally) improved ten-fold since then, with threadpool being rewritten, its hill-climbing algorithm receiving further tuning and it getting proactive blocked workers detection that can inject threads immediately without going through hill-climbing.

miohtama · 2024-03-25T15:20:35 1711380035

Also there is the extra software development and maintenance cost due to coloured functions that async/await causes

https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...

Unless you are doing high scalability software, async might not be worth of the trade offs.

nine_k · 2024-03-25T18:55:00 1711392900

If you expect to color your functions async by default, it's really easy to turn a sync functuion into a near-zero-cost async, a Future that has already been resolved at construction time by calling the sync function.

This way, JS / TS becomes pretty comfortable. Except for stacktraces, of course.

yencabulator · 2024-03-25T19:36:53 1711395413

Except when your sync function calls a function that makes the async runtime deadlock. No, there's no compiler error or lint for that. Good luck!

ngrilly · 2024-03-26T11:34:48 1711452888

Useful stacktraces is one of the reasons why I use Go instead of JS/TS on the server-side.

RReverser · 2024-03-26T12:07:44 1711454864

JS has proper async stacktraces as well, or do you mean something else?

ngrilly · 2024-03-26T12:25:48 1711455948

You're right. But it was not the case years ago when my team evaluated Go and Node.js. I should have clarified.

laggyluke · 2024-03-26T12:52:09 1711457529

> Useful stacktraces is one of the reasons why I use Go

I'm not sure if this is sarcasm or not.

ngrilly · 2024-03-26T14:45:07 1711464307

Fair enough. The context is that years ago, my team was evaluating Go and Node.js as options for a server requiring concurrency, and back then Node.js didn't provide any stacktrace for async callbacks. I know it has been improved since then, but I don't use Node.js.

What do you miss in Go's stack traces?

MuffinFlavored · 2024-03-25T17:20:22 1711387222

> Async/await with one thread is simple and well-understood. That's the Javascript model.

Bit of nuance (I'm not an authority of this and I don't know the current up-to-date answer across 'all' of the Javascript runtimes these days):

Isn't it technically "the developer is exposed the concept of just one thread without having to worry about order of execution (other than callbacks/that sort of pre-emption)" but under the hood it can actually be using as many threads as it wants (and often does)? It's just "abstracted" away from the user.

recursive · 2024-03-25T17:37:35 1711388255

I don't think so. Two threads updating the same variable would be observable in code. There's no way around it.

MuffinFlavored · 2024-03-25T17:43:55 1711388635

I'm referring to something like this: https://stackoverflow.com/questions/7018093/is-nodejs-really...

It's like a pedantic technical behind the scenes point I think, just trying to learn "what's true"

recursive · 2024-03-25T21:45:10 1711403110

Looks like the spec refers to the thing-that-has-the-thread as an "agent". https://tc39.es/ecma262/#sec-agents

I don't know the details about how any implementation of a javascript execution environment allows for the creation of new agents.

kaba0 · 2024-03-25T17:56:12 1711389372

I mean, yeah, if you go deep enough your OS may decide to schedule your browser thread to a different core as well. I don’t think it has any relevance here - semantically, it is executed on a single thread, which is very different from multi-threading.

MuffinFlavored · 2024-03-25T18:50:22 1711392622

It is executed by your runtime which may or may not behind the scenes be using a single thread for your execution and/or the underlying I/O/eventing going on underneath, no?

throwitaway1123 · 2024-03-26T01:22:33 1711416153

Most JS runtimes are multi-threaded behind the scenes. If you start a node process:

  node -e "setTimeout(()=>{}, 10_000)" &

Then wait a second and run:

  ps -o thcount $!

Or on macOS:

  ps -M $!

You'll see that there are multiple threads running, but like others have said, it's completely opaque to you as a programmer. It's basically just an implementation detail.

throwitaway1123 · 2024-03-26T15:56:19 1711468579

If you need true parallelism of course you can opt-in to Web Workers (called Worker Threads in Node), or the Node-specific child_process.fork, or cluster module.

HarHarVeryFunny · 2024-03-25T15:19:45 1711379985

I'm not sure what you mean by "multi-threaded async/wait"... Isn't the article considering async/await as an alternative to threads (i.e. coroutines vs threads)?

I'm a C++ programmer, and still using C++17 at work, so no coroutines, but don't futures provide a similar API? Useful for writing async code in serialized fashion that may be easier (vs threads) to think about and debug.

Of course there are still all the potential pitfalls that you enumerate, so it's no magic bullet for sure, but still a useful style of programming on occasion.

zarzavat · 2024-03-25T16:10:17 1711383017

They mean async/await running over multiple OS threads compared to over one OS thread.

You can also have threads running on one OS thread (Python) or running on multiple OS threads (everything else).

Every language’s concurrency model is determined by both a concurrency interface (callbacks, promises, async await, threads, etc), and an implementation (single-threaded, multiple OS threads, multiple OS processes).

lights0123 · 2024-03-25T15:42:03 1711381323

async/await tasks can be run in parallel on multiple threads, usually no more threads than there are hardware threads. This allows using the full capabilities of the machine, not just one core's worth. In a server environment with languages that support async/await but don't have the ability to execute on multiple cores like Node.js and Python, this is usually done by spawning many duplicate processes and distributing incoming connections round-robin between them.

_flux · 2024-03-25T08:05:10 1711353910

Jemalloc can use separate arenas for different threads which I imagine mostly solves the futex congestion issue. Perhaps it introduces new ones?

gpderetta · 2024-03-25T12:22:18 1711369338

IIRC glibc default malloc doesn't use per-thread arenas as they would waste too much memory on programs spawning tens of thousands of threads, and glibc can't really make too many workload assumptions. Instead I think it uses a fixed pool of arenas and tries to minimize contention.

These days on Linux, with restartable sequences, you can have true per-cpu arenas with zero contention. Not sure which allocator use them though.

celrod · 2024-03-25T14:56:02 1711378562

> As pressure from thread collisions increases, additional arenas are created via mmap to relieve the pressure. The number of arenas is capped at eight times the number of CPUs in the system (unless the user specifies otherwise, see mallopt), which means a heavily threaded application will still see some contention, but the trade-off is that there will be less fragmentation.

https://sourceware.org/glibc/wiki/MallocInternals

So glibc's malloc will use up to 8x #CPUs arenas. If you have 10_000 threads, there is likely to be contention.

JonChesterfield · 2024-03-25T12:41:04 1711370464

https://git.kernel.org/pub/scm/libs/librseq/librseq.git/tree...

Thank you, I didn't know about this one. An allocator that seems to use it at https://google.github.io/tcmalloc/rseq.html

jeffbee · 2024-03-25T14:57:27 1711378647

Not only does tcmalloc use rseq, the feature was contributed to Linux by the tcmalloc authors, for this purpose, among other purposes.

yxhuvud · 2024-03-26T14:16:39 1711462599

Glibc allocator is also notoriously prone to not giving back memory that could be given back to the OS when running several threads over a long time period. Reducing pool size helps a bit, but not sufficiently to make memory stable.

klysm · 2024-03-25T14:20:22 1711376422

I find this has worked well for me when I can easily state what thread pool work gets executed on.

mmis1000 · 2024-03-26T01:37:56 1711417076

I think its kotlin's model. The language don't have a default co-routine executor set, you need to provide your own or spawn one with the std. It can be thread-pooled, single thread, or custom one but there isn't a default one set.

If you use single threaded executor, then race condition won't happen. If you choose pooled, then you obviously should realize there can be a race condition. It's all about choice you made.

jmspring · 2024-03-25T08:07:19 1711354039

You focus on rust rather than generalizing...

If you are IO bound, consider threads. This is almost the same as async / await.

What was missing above, and the problem with how most compute education is these days, if you are compute bound you need to think about processes.

If you were dealing with python concurrent.futures, you would need to consider processpooexecutor vs. threadpoolexecutor.

Threadpoolexecutor gives you the same as the above.

With multiprocessor executor, you will have multiple processes executing independently but you have to copy a memory space. Which people don't consider. In python DS work - multiprocessor workloads need to determine memory space considerations.

It's kinda f'd up how JS doesn't have engineers think about their workloads.

initplus · 2024-03-25T11:02:42 1711364562

I think you are coming at this from a particular Python mindset, driven by the limitations imposed on Python threading by the GIL. This is a peculiarity specific to Python rather than a general purpose concept about threads vs processes.

danbruc · 2024-03-25T10:02:03 1711360923

[...] if you are compute bound you need to think about processes.

How would that help? Running several processes instead of several threads will not speed anything up [1] and might actually slow you down because of additional inter-process communication overhead.

[1] Unless we are talking about running processes across multiple machines to make use of additional processors.

zelphirkalt · 2024-03-25T10:21:44 1711362104

I think you need to clarify what you mean by "thread". For example they are different things when we compare Python and Java Threads. Or OS threads and green threads. I think the GP was relating to OS threads.

danbruc · 2024-03-25T10:52:39 1711363959

I was also referring to kernel threads. If we are talking about non-kernel threads, then sure, a given implementation might have limitations and there might be something to be gained by running several processes, but that would be a workaround for those limitations. But for kernel threads there will generally be no gain by spreading them across several processes.

josephg · 2024-03-25T11:27:36 1711366056

Right; a process is just a thread (or set of threads) and some associated resources - like file descriptors and virtual memory allocations. As I understand it, the scheduler doesn’t really care if you’re running 1000 processes with 1 thread each or 1 process with 1000 threads.

But I suspect it’s faster to swap threads within a process than swap processes, because it avoids expensive TLB flushes. And of course, that way there’s no need for IPC.

All things being equal, you should get more performance out of a single process with a lot of threads than a lot of individual processes.

Galanwe · 2024-03-25T13:04:10 1711371850

> a process is just a thread (or set of threads) and some associated resources - like file descriptors and virtual memory allocations

Or rather the reverse, in Linux terminology. Only processes exist, some just happen to share the same virtual address space.

> the scheduler doesn’t really care if you’re running 1000 processes with 1 thread each or 1 process with 1000 threads

Not just the scheduler, the whole kernel really. The concept of thread vs process is mainly a userspace detail for Linux. We arbitrarily decided that the set of clone() parameters from fork() create a process, while the set of clone() parameters through pthread_create() create a thread. If you start tweaking the clone() parameters yourself, then the two become indistinguishable.

> it’s faster to swap threads within a process than swap processes, because it avoids expensive TLB flushes

Right, though this is more of a theorical concern than a practical one. If you are sensible to a marginal TLB flush, then you may as well "isolcpu" and set affinities to avoid any context switch at all.

> that way there’s no need for IPC

If you have your processes mmap a shared memory, you effectively share address space between processes just like threads share their address space.

For most intent and purposes, really, I do find multiprocessing just better than multithreading. Both are pretty much indistinguishable, but separate processes give you the flexibility of being able to arbitrarily spawn new workers just like any other process, while with multithreading you need to bake in some form of pool manager and hope to get it right.

Animats · 2024-03-26T04:53:13 1711428793

> The concept of thread vs process is mainly a userspace detail for Linux. We arbitrarily decided that the set of clone() parameters from fork() create a process, while the set of clone() parameters through pthread_create() create a thread. If you start tweaking the clone() parameters yourself, then the two become indistinguishable.

That's from Plan 9. There, you can fork, with various calls, sharing or not sharing code, data, stack, environment variables, and file descriptors.[1] Now that's in Linux. It leads to a model where programs are divided into connected processes with some shared memory. Android does things that way, I think.

[1] https://news.ycombinator.com/item?id=863939

danbruc · 2024-03-25T13:39:34 1711373974

Or rather the reverse, in Linux terminology. Only processes exist, some just happen to share the same virtual address space.

Threads and processes are semantically quite different in standard computer science terminology. A thread has an execution state, i.e. its set of set of processor register values. A process on the other hand is a management and isolation unit for resources like memory and handles.

Hendrikto · 2024-03-25T11:43:28 1711367008

> If you are IO bound, consider threads. This is almost the same as async / await.

Only in Python.

> if you are compute bound you need to think about processes.

Also only in Python.

cryptonector · 2024-03-25T15:20:24 1711380024

If you're using threads then consider not using Python. Or, just consider not using Python.

baq · 2024-03-25T09:32:04 1711359124

Backend JS just spins up another container and/or lambda and if it's too slow and requires multiple CPUs in a single deployment, oh well, too bad.

zelphirkalt · 2024-03-25T10:22:40 1711362160

That is of course a huge overhead, compared to how other languages solve the problem.

fennecfoxy · 2024-03-27T10:47:04 1711536424

Backend JS does whatever the hecky you _want_ it to do.

The Cluster module has been around for a long time: https://nodejs.org/api/cluster.html

Tbf a lot of the time you're running it in a container and you allocate 1 vcpu there, only downside is maybe a little extra memory overhead. And for most lambdas I think they're suited to being single threaded (imo).

fjdhdhdhfhfhf · 2024-03-25T18:27:57 1711391277

^ kid who only writes python condescending to the engineers actually solving hard problems LOL

zaphar · 2024-03-25T12:55:14 1711371314

The complaints around async/await vs threads to my mind have not been that one is more or less complex than the other. It is that it bifurcates the ecosystem and one of them ends up being a second class citizen causing friction when you choose the wrong one for your project.

While you can mix and match them it's hacky and inefficient when you need to. As it stands now the Rust ecosystem has decided that if you want to do anything involving IO you are stuck with an all async/await ecosystem. Since nearly everything you might want to do in Rust probably involves IO with very few exceptions that means for the most part you should probably ignore non-async libraries regardless of whether the rest of your application wants it to be async or not.

There is a hypothetical world where Rust used abstractions that are even more composable than async/await, whose composability really wants everything else to be async/await too. If that had happened then I think most of the complaints would have disappeared.

K0nserv · 2024-03-25T13:20:51 1711372851

I agree with your diagnosis. It's what I concluded in my own Rust async blog post[0](which surely are mandatory now). It's even worse than bifurcating the ecosystem because even within async code it's almost always closely tied to the executor, usually Tokio. I talk about this as an extension to function colouring, adopting without.boats's three colour proposition with blue(non-IO), green(blocking-IO), and red(async-IO). In the extended model it's really blue, green, red(Tokio), purple(async-std), and orange(smol) etc.

I find that the sans-IO pattern is the best solution to this problem. Under this pattern you isolate all blue code and use inversion of control for I/O and time. This way you end up with the core protocol logic being unaware of IO and it becomes simple to wrap it in various forms of IO.

0: https://hugotunius.se/2024/03/08/on-async-rust.html

KiloCNC · 2024-03-25T16:14:12 1711383252

I love the fact that people outside the Python ecosystem are spreading the word about sans-IO. I think it should be the next iteration in coding using futures-based concurrency. I only wish it were more popular in the Python land as well.

zozbot234 · 2024-03-25T16:53:44 1711385624

The Haskell folks got there first.

Dessesaf · 2024-03-27T11:56:25 1711540585

Is that sans-IO pattern doing anything particularly novel, or is that basically just a subset of what any functional programmer does anyway?

Don't get me wrong, I'm fully on board with isolating IO. But why not go the slight extra step and just make it completely pure? You've already done the hard part of purity. The rest is easy.

Then you get all those nice benefits of being generic over async and sync, but can also memoize and parallelize freely, and all the other benefits of purity.

K0nserv · 2024-03-27T18:32:32 1711564352

It's been a while since my Haskell days, but I think the key difference is whether you are abstracting over the IO or if the IO sits outside the pure code entirely When you abstract over IO you have blue(pure code) that contains generic red, green, purple, or orange code. With sans-IO you inverted this so the non-blue code is driving things forward by calling into blue code.

Rust, in particular, does not support abstracting over syncness at the moment, although there's work happening there. Even if you add support for that you also then need to abstract over the executor in use. My fear is that this will be too leaky to be useful, but we'll see. For now sans-IO is the best option in Rust.

fsociety · 2024-03-25T14:09:08 1711375748

This is a cool pattern, thanks for the share.

K0nserv · 2024-03-25T14:13:33 1711376013

No problem, here are some examples of it:

* Quinn(A QUIC implementation, in particular `quinn-proto` is sans-IO, whereas the outer crate, `quinn`, is Tokio-based)[0)

* str0m(A WebRTC implementation that I work on, it's an alternative to `webrtc-rs`. We don't have any IO-aware wrappers, the user is expected to provide that themselves atm)[1]

0: https://github.com/quinn-rs/quinn

1: https://github.com/algesten/str0m/

Kinrany · 2024-03-25T14:08:16 1711375696

> Since nearly everything you might want to do in Rust probably involves IO with very few exceptions that means for the most part you should probably ignore non-async libraries regardless of whether the rest of your application wants it to be async or not.

Only if you have two libraries to choose from and they are otherwise identical, which is rare. Using blocking code in async applications is not as seamless as it should be but not hard. Instead of writing `foo()` you write `tokio::spawn_blocking(foo).await`. It will run the new code in a separate thread and return a future that will resolve once that thread is done.

zaphar · 2024-03-25T14:22:42 1711376562

That assumes you are using Tokio. As another poster said not only does the ecosystem fragment along the async/non async lines but along the runtime lines. Async is an extremely leaky abstraction. You are in a way making my point for me. If you want to avoid painful refactoring you should basically always start out tokio-async and shim in non async code as needed because going the other way is going to hurt.

Kinrany · 2024-03-25T14:37:43 1711377463

`spawn_blocking` is a single function and it is not complicated, it shouldn't be too hard for other runtimes to do the same.

The end application does have to choose a runtime anyway and will have to stick with it because this area isn't standardized yet. This problem mostly affects the part of the ecosystem that wants to put complicated concurrency logic into libraries.

cryptonector · 2024-03-25T15:22:21 1711380141

`spawn_blocking` should be part of a core executor interface that all executors must provide.

Kinrany · 2024-03-25T14:10:52 1711375852

And of course most libraries don't even need IO because the application can do it for them, so it only makes sense for them to be async if they're computationally heavy enough to cause problems for the runtime.

HideousKojima · 2024-03-25T13:04:24 1711371864

>As it stands now the Rust ecosystem has decided that if you want to do anything involving IO you are stuck with an all async/await ecosystem.

I mean C# works basically the same way, even through there are non-async options for IO using the async options basically forces you to be async all the way back to Main(). There are ways to safely call async methods from sync methods but they make debugging infinitely harder.

zaphar · 2024-03-25T14:08:19 1711375699

Well, yes. That doesn't mean it's not annoying though. It happens in every language that provides syntactic support for the distinction between async/await and non async. It's, I think, core to the syntactic and semantic abstractions that were popularized by Javascript.

adontz · 2024-03-25T05:51:52 1711345912

There are a lot of moments not covered. For example:

- async/await runs in context of one thread, so there is no need for locks or synchronization. Unless one runs async/await in multiple threads to actually utilize CPU cores, then locks and synchronization are necessary again. This complexity may be hidden in some external code. For example instead of synchronizing access to a single database connection it is much easier to open one database connection per async task. However such approach may affect performance, especially with sqlite and postgres.

- error propagation in async/await is not obvious. Especially when one tries to group up async tasks. Happy eyeballs are a classic example.

- since network I/O was mentioned, backpressure should also be mentioned. CPython implementation of async/await notoriously lacks network backpressure causing some problems.

bsder · 2024-03-25T06:42:51 1711348971

I have lots of issues with async/await, but this is my primary beef with async/await:

Remember the Gang of Four book "Design Patterns"? It was basically a cookbook on how to work around the deficiencies of (mostly) C++. Yet everybody applied those patterns inside languages that didn't have those deficiencies.

Rust can run multiple threads just fine--it's not Javascript. As such, it didn't have to use async/await. It could have tried any of a bunch of different solutions. Rust is a systems language, after all.

However, async/await was necessary in order to shove Rust down the throats of the Javascript programmers who didn't know anything else. Quoting without.boats:

https://without.boats/blog/why-async-rust/

> I drove at async/await with the diligent fervor of the assumption that Rust’s survival depended on this feature.

Whether async/await was even a good fit for Rust technically was of no consequence. Javascript programmers were used to async/await so Rust was going to have async/await so Rust could be jammed down the throats of the Javascript network services programmers--technical consequences be damned.

tsimionescu · 2024-03-25T07:53:30 1711353210

Async/await was invented for C#, another multithreaded language. It was not designed to work around a lack of true parallelism. It is instead designed to make it easier to interact with async IO without having to resort to manually managed thread pools. It basically codifies at the language level a very common pattern for writing concurrent code.

It is true though that async/await has a significant advantage compared to fibers that is related to single threaded code: it makes it very easy to add good concurrency support on a single thread, especially in languages which support both. In C#, it was particularly useful for executing concurrent operations from the single GUI thread of WPF or WinForms, or from parts of the app which interact with COM. This used the SingleThreadedExecutor, which schedules tasks on the current thread, so it's safe to run GUI updates or COM interactions from a Task, while also using any other async/await code, since tasks inherit their executor.

kamray23 · 2024-03-25T10:13:25 1711361605

Yeah, Microsoft looked at callback hell, realized that they had seen this one before, dipped into the design docs for F# and lifted out the syntactic sugar of monads. And it worked fine. But really, async/await is literally callbacks. The keyword await just wraps the rest of the function in a lambda and stuffs it in a callback. It's fully just syntactic sugar. It's a great way of simplifying how callback hell is written, but it's still callback hell in the end. Where having everything run in callbacks makes sense, it makes sense. Where it doesn't it doesn't. At some point you will start using threads, because your use case calls for threads instead of callbacks.

WorldMaker · 2024-03-25T14:38:11 1711377491

Most compilers don't just wrap the rest of the function into a lambda but build a finite state machine with each await point being a state transition. It's a little bit more than just "syntactic sugar" for "callbacks". In most compilers it is most directly like the "generator" approach to building iterators (*function/yield is ancient async/await).

I think the iterator pattern in general is a really useful reference to keep in mind. Of course async/await doesn't replace threads just like iterators don't replace lists/arrays. There are some algorithms you can more efficiently write as iterators rather than sequences of lists/arrays. There are some algorithms you can more efficiently write as direct list/array manipulation and avoid the overhead of starting iterator finite state machines. Iterator methods are generally deeply composable and direct list/array manipulation requires a lot more coordination to compose. All of those things work together to build the whole data pipeline you need for your app. So too, async/await makes it really easy to write some algorithms in a complex concurrent environment. That async/await runs in threads and runs with threads. It doesn't eliminate all thinking about threads. async/await is generally deeply composable and direct thread manipulation needs more work to coordinate. In large systems you probably still need to think about both how you are composing your async/await "pipelines" and also how your threads are coordinated. The benefits of composition such as race/await-all/schedulers/and more are generally worth the extra complexity and overhead (mental and computation space/time), which is why the pattern has become so common so quickly. Just like you can win big with nicely composed stacks of iterator functions. (Or RegEx or Observables or any of the other many cases where designing complex state machines both complicates how the system works and eventually simplifies developer experience with added composability.)

kamray23 · 2024-04-04T08:28:25 1712219305

Eh, that's true, and that's a convenient way of doing intermediate representation, since its very machine-friendly. But really, finite state machines are just callbacks, just as generators can be treated as just callbacks. There is no real logical difference, and it is their historical origin, even for generators which is just a neat syntax for what could have been done back in the day with a more explicit OO solution.

It does provide a more conceptual way of thinking about what those old callbacks would have meant though, which opens up thinking about scheduling them. Still, it's not something I'd rather do, if I need an asynchronous iterator I'll write one but if I need to start scheduling tasks then I'm using threads and leaving it to someone smarter than me.

yxhuvud · 2024-03-25T07:39:13 1711352353

I generally don't agree with the direction withoutboats went with asynchricity but you are reading in a whole lot more into that sentence than is really there. It is very clear (based on his writing, in this and other articles) that he went with the solution because he thinks it is the right one, on a technical level.

I don't agree, but making it sound like it was about marketing the language to JavaScript people is just wrong.

geodel · 2024-03-25T14:52:40 1711378360

> was about marketing the language to JavaScript people is just wrong.

No it seems very right to me. Rust despite being "Systems language" was not satisfied with market size of systems programing and they really needed all those millions of JS programmers to make language a big success.

withoutboats3 · 2024-03-26T12:41:07 1711456867

This is a lie. Async/await was developed to support systems that need to use non-blocking IO for performance reasons, not to appeal to JS programmers.

seabrookmx · 2024-03-25T06:54:23 1711349663

Threads have a cost. Context switching between them at the kernel level has a cost. There are some workloads that gain performance by multiplexing requests on a thread. Java virtual threads, golang goroutines, and dotnet async/await (which is multi threaded like Rust+tokio) all moved this way for _performance_ reasons not for ergonomic or political ones.

It's also worth pointing out that async/await was not originally a JavaScript thing. It's in many languages now but was first introduced in C#. So by your logic Rust introduced it so it could be "jammed down the throats" of all the dotnet devs..

bsder · 2024-03-25T08:08:06 1711354086

> all moved this way for _performance_ reasons

They did NOT.

Async performance is quite often (I would even go so far as to say "generally") worse than single threaded performance in both latency AND throughput under most loads that programmers ever see.

Most of the complications of async are much like C#:

1) Async allows a more ergonomic way to deal with a prima donna GUI that must be the main thread and that you must not block. This has nothing to do with "performance"--it is a limitation of the GUI toolkit/Javascript VM/etc..

2) Async adds unavoidable latency overhead and everybody hits this issue.

3) Async nominally allows throughput scaling. Most programmers never gain enough throughput to offset the lost latency performance.

seabrookmx · 2024-03-25T18:08:25 1711390105

1) it offers a more ergonomic way for concurrency in general. `await Task.WhenAll(tasks);` is (in my opinion) more ergonomic than spinning up a thread pool in any language that supports both.

2) yes, there is a small performance overhead for continuations. Everything is a tradeoff. Nobody is advocating for using async/await for HFT, or in low level languages like C or Zig. We're talking nanoseconds here.. for a typical web API request that's in the 10's of ms that's a drop in the ocean.

3) I wouldn't say it's nominal! I'd argue most non-trivial web workloads would benefit from this increase in throughput. Pre-fork webservers like gunicorn can consume considerably more resources to serve the same traffic than an async stack such as uvicorn+FastAPI (to use Python as an example).

> Most of the complications of async are much like C#

Not sure where you're going with this analogy but as someone who's written back-end web services in basically every language (other than lisp, no hate though), C#/dotnet core is a pretty great stack. If you haven't tried it in a while you should give it a shot.

josephg · 2024-03-25T12:00:03 1711368003

Eh. Async and to a lesser extent green threads are the only solutions to slowloris HTTP attacks. I suppose your other option is to use a thread pool in your server - but then you need to but hide your web server behind nginx to keep it safe. (And nginx is safe because it internally uses async IO).

Async is also usually wildly faster for networked services than blocking IO + thread pools. Look at some of the winners of the techempower benchmarks. All of the top results use some form of non blocking IO. (Though a few honourable mentions use go - with presumably a green thread per request):

https://www.techempower.com/benchmarks/

I’ve also never seen Python or Ruby get anywhere near the performance of nodejs (or C#) as a web server. A lot of the difference is probably how well tuned v8 and .net are, but I’m sure the async-everywhere nature of javascript makes a huge difference.

klooney · 2024-03-25T14:34:19 1711377259

Async's perfect use case is proxies though- get a request, go through a small decision tree, dispatch the I/O to the kernel. You don't want proxies doing complex logic or computation, the stuff that creates bottlenecks in the cooperative multithreading.

seabrookmx · 2024-03-25T16:15:45 1711383345

Most API's (rest, graphql or otherwise) are effectively a proxy. Like you say, if you don't have complex logic and you're effectively mapping an HTTP request to a query, then your API code is just juggling incoming and outgoing responses and this evented/cooperative approach is very effective.

anonymoushn · 2024-03-25T10:24:08 1711362248

Where does the unavoidable latency overhead come from?

Do you have some benchmarks available?

neonsunset · 2024-03-25T10:58:28 1711364308

The comment you are responding to is not wrong about higher async overhead, but it is wrong at everything else either out of lack of experience with the language or out of being confused about what it is that Task<T> and ValueTask<T> solve.

All asynchronous methods (as in, the ones that have async keyword prefixed to them) are turned into state machines, where to live across await, the method's variables that persist across it need to be lifted to a state machine struct, which is then often (but not always) needs to be boxed aka heap allocated. All this makes the cost of what would have otherwise been just a couple of method calls way more significant - single await like this can cost 50ns vs 2ns spent on calling methods.

There is also a matter of heap allocations for state machine boxes - C# is generally good when it comes to avoiding them for (value)tasks that complete synchronously and for hot async paths that complete asynchronously through pooling them, but badly written code can incur unwanted overhead by spamming async methods with await points where it could have been just forwarding a task instead. Years of bad practices arisen from low skill enterprise dev fields do not help this either, with only the switch to OSS and more recent culture shift aided by better out of box analyzers somewhat turning the tide.

This, however, does not stop C#'s task system from being extremely useful for achieving lowest ceremony concurrency across all programming languages (yes, it is less effort than whatever Go or Elixir zealots would have you believe) where you can interleave, compose and aggregate task-returning methods to trivially parallelize/fork/join parts of existing logic leading to massive code productivity improvement. Want to fire off request and do something else? Call .GetStringAsync but don't await it and go back to it later with await when you do need the result - the request will be likely done by then. Instant parallelism.

With that said, Rust's approach to futures and async is a bit different, where-as C#'s each async method is its own task, in Rust the entire call graph is a single task with many nested futures where the size of the sum of all stack frames is known statically hence you can't perform recursive calls within async there - you can only create a new (usually heap-allocated) which gives you what effectively looks a linked list of task nodes as there is no infinite recursion in calculating their sizes. This generally has lower overhead and works extremely well even in no-std no-alloc scenarios where cooperative multi-tasking is realized through a single bare metal executor, which is a massive user experience upgrade in embedded land. .NET OTOH is working on its own project to massively reduce async overhead but once the finished experiment sees integration in dotnet/runtime itself, you can expect more posts on this orange site about it.

mellinoe · 2024-03-25T13:06:29 1711371989

> .NET OTOH is working on its own project to massively reduce async overhead

Where can I read more about that?

neonsunset · 2024-03-25T13:19:47 1711372787

Initial experiment issue: https://github.com/dotnet/runtime/issues/94620

Experiment results write-up: https://github.com/dotnet/runtimelab/blob/e69dda51c7d796b812...

TLDR: The green threads experiment was a failure as it found (expected and obvious) issues that the Java applications are now getting to enjoy, joining their Go colleagues, while also requiring breaking changes and offering few advantages over existing model. It, however, gave inspiration to subsequent re-examination of current async/await implementation and whether it can be improved by moving state machine generation and execution away from IL completely to runtime. It was a massive success as evidenced by preliminary overhead estimations in the results.

vips7L · 2024-03-26T03:05:20 1711422320

The tl;dr that I got when I read these a few months ago was that C# relies on too much FFI which makes implementing green threads hard and on top of that would require a huge effort to rewrite a lot of stuff to fit the green thread model. Java and Go don’t have these challenges since Go shipped with a huge standard library and Java’s ecosystem is all written in Java since it never had good ffi until recently.

neonsunset · 2024-03-26T03:21:23 1711423283

Surely you're not claiming that .NET's standard library is not extensive and not written in C#.

If you do, consider giving .NET a try and reading the linked content if you're interested - it might sway your opinion towards more positive outlook :)

vips7L · 2024-03-26T10:36:03 1711449363

> Surely you're not claiming that .NET's standard library is not extensive and not written in C#.

I’m claiming that MSFT seems to care really about P/Invoke and FFI performance and it was one of the leading reasons for them not to choose green threads. So there has to be something in .NET or C# or win forms or whatever that is influencing the decision.

I’m also claiming that this isn’t a concern for Java. 99.9% of the time you don’t go over FFI and it’s what lead the OpenJdk team to choose virtual threads.

> If you do, consider giving .NET a try

I’d love to, but dealing with async/await is a pain :)

neonsunset · 2024-03-26T10:56:17 1711450577

You’ve never used it, so how can you know?

vips7L · 2024-03-26T17:28:34 1711474114

How do you know I've never used it? Do you have a crystal ball?

lelanthran · 2024-03-25T07:47:34 1711352854

> So by your logic Rust introduced it so it could be "jammed down the throats" of all the dotnet devs..

You're missing his point. His point is that the most popular language, which has the most number of programmers forced the hand of Rust devs.

His point is not that the first language had this feature, it's that the most programmers used this feature, and that was due to the most popular programming language having this feature.

tsimionescu · 2024-03-25T08:01:39 1711353699

That Rust needed async/await to be palatable to JS devs would only be a problem if we think async/await is not needed in Rust, because it is only useful to work around limitations of JS (single-threaded execution, in this case). If instead async/await is a good feature in its own right (even if not critical), then JS forcing Rust's hand would be at best an annoyance.

And the idea that async/await was only added to JS to work around its limitations is simply wrong. So the OP is overall wrong: async/await is not an example of someone taking something that only makes sense in one language and using it another language for familiarity.

lelanthran · 2024-03-25T08:09:47 1711354187

> So the OP is overall wrong: async/await is not an example of someone taking something that only makes sense in one language and using it another language for familiarity.

I don't really understand the counter argument here.

My reading of the argument[1] is that "Popularity amongst developers forced Rust devs hands in adding async". If this is the argument, then a counter argument of "It never (or only) made sense in the popular language (either)" is a non-sequitor.

IOW, if it wasn't added due to technical reasons (which is the original argument, IIRC), then explaining technical reasons for/against isn't a counter argument.

[1] i.e. Maybe I am reading it wrong?

withoutboats3 · 2024-03-26T12:48:47 1711457327

You are not reading the claim wrong, but the claim is a lie. We did not add async/await to Rust because it was popular but because it was the right solution for Rust. If you actually read my post that this liar linked to, you will find a detailed explanation of the technical history behind the decision.

bsder · 2024-03-25T08:13:38 1711354418

You are not reading it wrong, and your statements are accurate.

My broader point is that the possibility of there being a "technically better" construct was simply not in scope for Rust. In order for Rust to capture Javascript programmers, async/await was the only construct that could possibly be considered.

And, to be fair, it worked. Rust's growth has been almost completely on the back of network services programming.

withoutboats3 · 2024-03-26T12:47:23 1711457243

This comment is a lie.

withoutboats3 · 2024-03-26T13:03:13 1711458193

That is his claim, but he is lying.

OtomotO · 2024-03-25T06:55:14 1711349714

I would damn this, if Async/Await wasn't a good enough (TM) solution for certain problems where Threads are NOT good enough.

Remember: there is a reason why Async/Await was created B E F O R E JavaScript was used for more than sprinkling a few fancy effects on some otherwise static webpages

tuetuopay · 2024-03-25T18:12:19 1711390339

Strong disagree.

> Rust can run multiple threads just fine--it's not Javascript. As such, it didn't have to use async/await. It could have tried any of a bunch of different solutions. Rust is a systems language, after all.

it allows you to have semantic concurrency where there are no threads available. like, you known, on microncontrollers without an (RT)OS where such a systems programming language is a godsend.

seriously, using async/await on embedded makes so much sense.

pkolaczk · 2024-03-25T13:08:48 1711372128

> Rust can run multiple threads just fine

Rust is also used in environments which don't support threads. Embedded, bare metal, etc.

Karrot_Kream · 2024-03-25T07:56:50 1711353410

async/await is just a different concurrency paradigm with different strengths and weaknesses than threads. Rust has support for threaded concurrency as well though the ecosystem for it is a lot less mature.

withoutboats3 · 2024-03-26T12:09:53 1711454993

Every word you've written is false, slanderous and idiotic. You are quoting a post in which I explain at length why async/await was the right fit for Rust technically. You are either illiterate or malignant.

Despite your evident ignorance, there are many network services that are not written in JavaScript. In fact, there are many that are written in C or C++. This is the addressable market of async Rust. Appealing to JavaScript users was not in any way a motivating factor for the development of async/await in Rust. Not at all!

guappa · 2024-03-25T09:59:36 1711360776

Threads are much much slower than async/await.

graphenus · 2024-03-25T06:14:26 1711347266

Async/await just like threads is a concurrency mechanism and also always requires locks when accessing the shared memory. Where does your statement come from?

conradludgate · 2024-03-25T06:48:23 1711349303

If you perform single threaded async in Rust, you can drop down to the cheap single threaded RefCell rather than the expensive multithreaded Mutex/RwLock

cogman10 · 2024-03-25T11:20:12 1711365612

That's one example of a lock you might eliminate, but there are plenty of other cases where it's impossible to eliminate locks even while single threaded.

Consider, for example, something like this (not real rust, I'm rusty there)

    lock {
      a = foo();
      b = io(a).await;
      c = bar(b);
    }

Eliminating this lock is unsafe because a, b, and c are expected to be updated in tandem. If you remove the lock, then by the time you reach c, a and b may have changed under your feet in an unexpected way because of that await.

josephg · 2024-03-25T11:41:07 1711366867

Yeah but this problem goes away entirely if you just don’t await within a critical region like that.

I’ve been using nodejs for a decade or so now. Nodejs can also suffer from exactly this problem. In all that time, I think I’ve only reached for a JS locking primitive once.

cogman10 · 2024-03-25T15:12:10 1711379530

There is no problem here with the critical region. The problem would be removing the critical region because "there's just one thread".

This is incorrect code

      a = foo();
      b = io(a).await;
      c = bar(b);

Without the lock, `a` can mutate before `b` is done executing which can mess with whether or not `c` is correct. The problem is if you have 2 independent variables that need to be updated in tandem.

Where this might show up. Imagine you have 2 elements on the screen, a span which indicates the contents and a div with the contents.

If your code looks like this

    mySpan.innerText = "Loading ${foo}";
    myDiv.innerText = load(foo).await;
    mySpan.innerText = "";

You now have incorrect code if 2 concurrent loads happen. It could be the original foo, it could be a second foo. There's no way to correctly determine what the content of `myDiv` is from an end user perspective as it depends entirely on what finished last and when. You don't even know if loading is still happening.

josephg · 2024-03-25T20:29:12 1711398552

I absolutely agree that that code looks buggy. Of course it is - if you just blindly mix view and model logic like that, you’re going to have a bad day. How many different states can the system be in? If multiple concurrent loads can be in progress at the same time, the answer is lots.

But personally I wouldn’t solve it with a lock. I’d solve it by making the state machine more explicit and giving it a little bit of distance from the view logic. If you don’t want multiple loads to happen at once, add an is_loading variable or something to track the loading state. When in the loading state, ignore subsequent load operations.

cogman10 · 2024-03-25T21:20:57 1711401657

> add an is_loading variable or something to track the loading state.

Which is definitionally a mutex AKA a lock. However, it's not a lock you are blocking on but rather one that you are trying and leaving.

I know it doesn't look like a traditional lock, but in a language like javascript or python it's a valid locking mechanism. For javascript that's because of the single thread execution model a boolean variable is guaranteed to be consistently set for multiple concurrent actions.

That is to say, you are thinking about concurrency issues, you just aren't thinking about them in concurrency terms.

Here's the Java equivalent to that concept

https://docs.oracle.com/javase/8/docs/api/java/util/concurre...

josephg · 2024-03-25T23:45:29 1711410329

Yeah I agree. The one time I wrote a lock in javascript it worked like you were hinting at. You could await() the lock's release, and if multiple bits of code were all waiting for the lock, they would acquire it in turn.

But again, I really think in UI code it makes a lot more sense to be clear about what the state is, model it explicitly and make the view a "pure" expression of that state. In the code above:

- The state is 0 or more promises loading data.

- The state is implicit. Ie, the code doesn't list the set of loading promises which are being awaited at any point in time. Its not obvious that there is a collection going on.

- The state is probably wrong. The developer probably wants either 0 or 1 loading states. (Or maybe a queue of them). Because the state hasn't been modelled explicitly, it probably hasn't been considered enough

- The view is updated incorrectly based on the state. If 2 loads happen at the same time, then 1 finishes, the UI removes the "loading..." indicator from the UI. Correct view logic should ensure that the UI is deterministic based on the internal state. 1 in-progress load should result in the UI saying "loading...".

Its a great example. With code like this I think you should always carefully and explicitly consider all of the states of your system, and how the state should change based on user action. Then all UI code can flow naturally from that.

A lock might be a good tool. But without thinking about how you want the program to behave, we have no way to tell. And once you know how you want your program to behave, I find locks to be usually unnecessary.

nadagast · 2024-03-25T21:38:36 1711402716

I think a lot of this type of problem goes away with immutable data and being more careful with side effects (for example, firing them all at once at the end rather than dispersed through the calculation)

romanovcode · 2024-03-25T06:20:28 1711347628

> Where does your statement come from?

This is how async/await works in Node (which is single-threaded) so most developers think this is how it works in every technology.

bheadmaster · 2024-03-25T06:42:28 1711348948

Even in Node, if you perform asynchronous operations on a shared resource, you need synchronization mechanisms to prevent interleaving of async functions.

There has been more than one occasion when I "fixed" a system in NodeJS just by wrapping some complex async function up in a mutex.

nurple · 2024-03-25T07:32:56 1711351976

This lacks quite a bit of nuance. In node you are guaranteed that synchronous code between two awaits will run to completion before another task(that could access your state) from the event loop gets a turn; with multi-threaded concurrency you could be preempted between any two machine instructions. So while you _do_ have to serialize access to shared IO resources, you do _not_ have to serialize access to memory(just add the connection to the hashset, no locks).

What you usually see with JS for concurrency of shared IO resources in practice is that they are "owned" by the closure of a flow of async execution and rarely available to other flows. This architecture often obviates the need to lock on the shared resource at all as the natural serialization orchestrated by the string of state machines already naturally accomplishes this. This pattern was even quite common in the CPS style before async/await.

For example, one of the first things an app needs do before talking to a DB is to get a connection which is often retrieved by pulling from a pool; acquiring the reservation requires no lock, and by virtue of the connection being exclusively closed over in the async query code, it also needs no locking. When the query is done, the connection can be replaced to the pool sans locking.

The place where I found synchronization most useful was in acquiring resources that are unavailable. Interestingly, an async flow waiting on a signal for a shared resource resembles a channel in golang in how it shifts the state and execution to the other flow when a pooled resource is available.

All this to say, yeah I'm one of the huge fans of node that finds rust's take on default concurrency painfully over complicated. I really wish there was an event-loop async/await that was able to eschew most of the sync, send, lifetime insanity. While I am very comfortable with locks-required multithreaded concurrency as well, I honestly find little use for it and would much prefer to scale by process than thread to preserve the simplicity of single-threaded IO-bound concurrency.

mike_hearn · 2024-03-25T10:14:07 1711361647

> So while you _do_ have to serialize access to shared IO resources, you do _not_ have to serialize access to memory(just add the connection to the hashset, no locks).

No, this can still be required. Nothing stops a developer setting up a partially completed data structure and then suspending in the middle, allowing arbitrary re-entrancy that will then see the half-finished change exposed in the heap.

This sort of bug is especially nasty exactly because developers often think it can't happen and don't plan ahead for it. Then one day someone comes along and decides they need to do an async call in the middle of code that was previously entirely synchronous, adds it and suddenly you've lost data integrity guarantees without realizing it. Race conditions appear and devs don't understand it because they've been taught that it can't happen if you don't have threads!

bheadmaster · 2024-03-25T09:05:36 1711357536

> So while you _do_ have to serialize access to shared IO resources, you do _not_ have to serialize access to memory

Yes, in Node you don't get the usual data races like in C++, but data-structure races can be just as dangerous. E.g. modifying the same array/object from two interleaved async functions was a common source of bugs in the systems I've referred to.

Of course, you can always rely on your code being synchronous and thus not needing a lock, but if you're doing anything asynchronous and you want a guarantee that your data will not be mutated from another async function, you need a lock, just like in ordinary threads.

One thing I deeply dislike about Node is how it convinces programmers that async/await is special, different from threading, and doesn't need any synchronisation mechanisms because of some Node-specific implementation details. This is fundamentally wrong and teaches wrong practices when it comes to concurrency.

nurple · 2024-03-25T15:58:07 1711382287

But single-threaded async/await _is_ special and different from multi-threaded concurrency. Placing it in the same basket and prescribing the same method of use is fundamentally wrong and fails to teach the magic of idiomatic lock free async javascript.

I'm honestly having a difficult time creating a steel man js sample that exhibits data races unless I write weird C-like constructs and ignore closures and async flows to pass and mutate multi-element variables by reference deep into the call stack. This just isn't how js is written.

When you think about async/await in terms of shepherding data flows it becomes pretty easy to do lock free async/await with guaranteed serialization sans locks.

bheadmaster · 2024-03-25T16:23:00 1711383780

> I'm honestly having a difficult time creating a steel man js sample that exhibits data races

I can give you a real-life example I've encountered:

    const CACHE_EXPIRY = 1000; // Cache expiry time in milliseconds

    let cache = {}; // Shared cache object

    function getFromCache(key) {
      const cachedData = cache[key];
      if (cachedData && Date.now() - cachedData.timestamp < CACHE_EXPIRY) {
        return cachedData.data;
      }
      return null; // Cache entry expired or not found
    }

    function updateCache(key, data) {
      cache[key] = {
        data,
        timestamp: Date.now(),
      };
    }

    var mockFetchCount = 0;

    // simulate web request shorter than cache time
    async function mockFetch(url) {
      await new Promise(resolve => setTimeout(resolve, 100));
      mockFetchCount += 1;
      return `result from ${url}`;
    }

    async function fetchDataAndUpdateCache(key) {
      const cachedData = getFromCache(key);
      if (cachedData) {
        return cachedData;
      }

      // Simulate fetching data from an external source
      const newData = await mockFetch(`https://example.com/data/${key}`); // Placeholder fetch

      updateCache(key, newData);
      return newData;
    }

    // Race condition:
    (async () => {
      const key = 'myData';

      // Fetch data twice in a sequence - OK
      await fetchDataAndUpdateCache(key);
      await fetchDataAndUpdateCache(key);
      console.log('mockFetchCount should be 1:', mockFetchCount);

      // Reset counter and wait cache expiry
      mockFetchCount = 0;
      await new Promise(resolve => setTimeout(resolve, CACHE_EXPIRY));

      // Fetch data twice concurrently - we executed fetch twice!
      await Promise.all([fetchDataAndUpdateCache(key), fetchDataAndUpdateCache(key)]);
      console.log('mockFetchCount should be 1:', mockFetchCount);
    })();

This is what happens when you convince programmers that concurrency is not a problem in JavaScript. Even though this cache works for sequential fetching and will pass trivial testing, as soon as you have concurrent fetching, the program will execute multiple fetches in parallel. If server implements some rate-limiting, or is simply not capable of handling too many parallel connections, you're going to have a really bad time.

Now, out of curiosity, how would you implement this kind of cache in idiomatic, lock-free javascript?

wolfgang42 · 2024-03-25T17:44:13 1711388653

> how would you implement this kind of cache in idiomatic, lock-free javascript?

The simplest way is to cache the Promise<data> instead of waiting until you have the data:

    -async function fetchDataAndUpdateCache(key: string) {
    +function fetchDataAndUpdateCache(key: string) {
       const cachedData = getFromCache(key);
       if (cachedData) {
         return cachedData;
       }

       // Simulate fetching data from an external source
     -const newData = await mockFetch(`https://example.com/data/${key}`); // Placeholder fetch
     +const newData = mockFetch(`https://example.com/data/${key}`); // Placeholder fetch

       updateCache(key, newData);
       return newData;
     }

From this the correct behavior flows naturally; the API of fetchDataAndUpdateCache() is exactly the same (it still returns a Promise<result>), but it’s not itself async so you can tell at a glance that its internal operation is atomic. (This does mildly change the behavior in that the expiry is now from the start of the request instead of the end; if this is critical to you you can put some code in `updateCache()` like `data.then(() => cache[key].timestamp = Date.now()).catch(() => delete cache[key])` or whatever the exact behavior you want is.)

I‘m not even sure what it would mean to “add a lock” to this code; I guess you could add another map of promises that you’ll resolve when the data is fetched and await on those before updating the cache, but unless you’re really exposing the guts of the cache to your callers that’d achieve exactly the same effect but with a lot more code.

bheadmaster · 2024-03-25T21:13:37 1711401217

Ok, that's pretty neat. Using Promises themselves in the cache instead of values to share the source of data itself.

While that approach has a limitation that you cannot read the data from inside the fetchDataAndUpdateCache (e.g. to perform caching by some property of the data), that goes beyond the scope of my example.

> I‘m not even sure what it would mean to “add a lock” to this code

It means the same as in any other language, just with a different implementation:

    class Mutex {
        locked = false
        next = []

        async lock() {
            if (this.locked) {
                await new Promise(resolve => this.next.push(resolve));
            } else {
                this.locked = true;
            }
        }

        unlock() {
            if (this.next.length > 0) {
                this.next.shift()();
            } else {
                this.locked = false;
            }
        }
    }

I'd have a separate map of keys-to-locks that I'd use to lock the whole fetchDataAndUpdateCache function on each particular key.

nurple · 2024-03-25T17:28:13 1711387693

Don't forget to fung futures that are fungible for the same key.

ETA: I appreciate the time you took to make the example, also I changed the extension to `mjs` so the async IIFE isn't needed.

  const CACHE_EXPIRY = 1000; // Cache expiry time in milliseconds
  
  let cache = {}; // Shared cache object
  let futurecache = {}; // Shared cache of future values
  
  function getFromCache(key) {
    const cachedData = cache[key];
    if (cachedData && Date.now() - cachedData.timestamp < CACHE_EXPIRY) {
      return cachedData.data;
    }
    return null; // Cache entry expired or not found
  }
  
  function updateCache(key, data) {
    cache[key] = {
      data,
      timestamp: Date.now(),
    };
  }
  
  var mockFetchCount = 0;
  
  // simulate web request shorter than cache time
  async function mockFetch(url) {
    await new Promise(resolve => setTimeout(resolve, 100));
    mockFetchCount += 1;
    return `result from ${url}`;
  }
  
  async function fetchDataAndUpdateCache(key) {
    // maybe its value is cached already
    const cachedData = getFromCache(key);
    if (cachedData) {
      return cachedData;
    }
  
    // maybe its value is already being fetched
    const future = futurecache[key];
    if(future) {
      return future;
    }
  
    // Simulate fetching data from an external source
    const futureData = mockFetch(`https://example.com/data/${key}`); // Placeholder fetch
    futurecache[key] = futureData;
  
    const newData = await futureData;
    delete futurecache[key];
  
    updateCache(key, newData);
    return newData;
  }
  
  const key = 'myData';
  
  // Fetch data twice in a sequence - OK
  await fetchDataAndUpdateCache(key);
  await fetchDataAndUpdateCache(key);
  console.log('mockFetchCount should be 1:', mockFetchCount);
  
  // Reset counter and wait cache expiry
  mockFetchCount = 0;
  await new Promise(resolve => setTimeout(resolve, CACHE_EXPIRY));
  
  // Fetch data twice concurrently - we executed fetch twice!
  await Promise.all([...Array(100)].map(() => fetchDataAndUpdateCache(key)));
  console.log('mockFetchCount should be 1:', mockFetchCount);

bheadmaster · 2024-03-25T21:38:07 1711402687

I see, this piece of code seems to be crucial:

    // maybe its value is already being fetched
    const future = futurecache[key];
    if(future) {
      return future;
    }

It indeed fixes the problem in a JS lock-free way.

Note that, as wolfgang42 has shown in a sibling comment, the original cache map isn't necessary if you're using a future map, since the futures already contain the result:

    async function fetchDataAndUpdateCache(key) {
        // maybe its value is cached already
        const cachedData = getFromCache(key);
        if (cachedData) {
          return cachedData;
        }

        // Simulate fetching data from an external source
        const newDataFuture = mockFetch(`https://example.com/data/${key}`); // Placeholder fetch

        updateCache(key, newDataFuture);
        return newDataFuture;
    }

---

But note that this kind of problem is much easier to fix than to actually diagnose.

My hypothesis is that the lax attutide of Node programmers towards concurrency is what causes subtle bugs like these to happen in the first place.

Python, for example, also has single-threaded async concurrency like Node, but unlike Node it also has all the standard synchronization primitives also implemented in asyncio: https://docs.python.org/3/library/asyncio-sync.html

nurple · 2024-03-26T00:52:34 1711414354

Wolfgang's optimization is very nice, I also found interesting his signal of a non-async function that returns a promise as an "atomic". I don't particularly like typed JS, so it would be less visible to me.

Absolutely agree on the observability of such things. One area I think shows some promise, though the tooling lags a bit, is in async context[0] flow analysis.

One area I have actually used it so far is in tracking down code that is starving the event loop with too much sync work, but I think some visualization/diagnostics around this data would be awesome.

If we view Promises/Futures as just ends of a string of a continued computation, whos resumption is gated by some piece of information, the points between where you can weave these ends together is where the async context tracking happens and lets you follow a whole "thread" of state machines that make up the flow.

Thinking of it this way, I think, also makes it more obvious how data between these flows is partitioned in a way that it can be manipulated without locking.

As for the node dev's lax attitude, I would probably be more agressive and say it's an overall lack of formal knowledge on how computing and data flow works. As an SE in DevOps a lot of my job is to make software work for people that don't know how computers, let alone platforms, work.

[0]: https://nodejs.org/api/async_context.html

dehrmann · 2024-03-25T06:58:40 1711349920

async can be scarier for locks since a block of code might depend on having exclusive access, and since there wasn't an await, it got it. Once you add an await in the middle, the code breaks. Threading at least makes you codify what actually needs exclusive access.

async also signs you up for managing your own thread scheduling. If you have a lot of IO and short CPU-bound code, this can be OK. If you have (or occasionally have) CPU-bound code, you'll find yourself playing scheduler.

cageface · 2024-03-25T07:13:14 1711350794

Yeah once your app gets to be sufficiently complex you will find yourself needing mutexes after all. Async/await makes the easy parts of concurrency easy but the hard parts are still hard.

specialist · 2024-03-25T14:43:01 1711377781

> backpressure should also be mentioned

I ran into this when I joined a team using nodejs. Misc services would just ABEND. Coming from Java, I was surprised by this oversight. It was tough explaining my fix to the team. (They had other great skills, which I didn't have.)

> error propagation in async/await is not obvious

I'll never use async/await by choice. Solo project, ...maybe. But working with others, using libraries, trying to get everyone on the same page? No way.

--

I haven't used (language level) structured concurrency in anger yet, but I'm placing my bets on Java's Loom Project. Best as I can tell, it'll moot the debate.

junon · 2024-03-25T08:58:59 1711357139

> async/await runs in context of one thread,

Not in Rust.

pohl · 2024-03-25T11:22:23 1711365743

There is a single thread executor crate you can use for that case if it’s what you desire, FWIW.

junon · 2024-03-25T11:28:10 1711366090

Yes of course, but the async/await semantics are not designed only to be single threaded. Typically promises can be resumed on any executor thread, and the language is designed to reflect that.

iknowstuff · 2024-03-26T08:51:12 1711443072

This is completely wrong. You gotta learn about Send and Sync in Rust before you speak.

Rust makes no assumptions and is explicitly designed to support both single and multi threaded executors. You can have non-Send Futures.

junon · 2024-03-26T10:37:14 1711449434

I'm fully aware of this, thanks @iknowstuff.

>>>>> Typically promises are designed...

I'm merely saying Rust async is not restricted to single threaded like many other languages design their async to be, because most people coming from Node are going to assume async is always single threaded.

Most people who write their promise implementations make them Send so they work with Tokio or Async-Std.

Relax, my guy. The shitty tone isn't necessary.

EDIT: Ah, your entire history is you just arguing with people. Got it.

avodonosov · 2024-03-25T08:24:19 1711355059

Issues with the article:

1. Only one example is given (web server), solved incorrectly for threads. I will elaborate below.

2. The question is framed as if people specifically want OS threads instead of async/await .

But programmers want threads conceptually, semantically. Write sequential logic and don't use strange annotations like "async". In other words, if async/await is so good, why not make all functions in the language implicitly async, and instead of "await" just write normal function calls? Then you will suddenly be programming in threads.

OS threads are expensive due to statically allocated stack, and we don't want that. We want cheap threads, that can be run in millions on a single CPU. But without the clumsy "async/await" words. (The `wait` word remains in it's classic sense: when you wait for an event, for another thread to complete, etc - a blocking operation of waiting. But we don't want it for function invocations).

Back to #1 - the web server example.

When timeout is implemented in async/await variant of the solution, using the `driver.race(timeout).await`, what happens to the client socket after the `race` signals the timeout error? Does socket remain open, remains connected to the client - essentially leaked?

The timeout solution for threaded version may look almost the same, as it looks for async/await: `threaded_race(client_thread, timeout).wait`. This threaded_race function uses a timer to track a timeout in parallel with the thread, and when the timeout is reached it calls `client_thread.interrupt()` - the Java way. (The `Thread.interrupt()`, if thread is not blocked, simply sets a flag; and if the thread is blocked in an IO call, this call throws an InterruptedException. That's a checked exception, so compiler forces programmer to wrap the `client.read_to_end(&mut data)` into try / catch or declare the exception in the `handle_client`. So programmer will not forget to close the client socket).

f_devd · 2024-03-25T09:22:20 1711358540

> When timeout is implemented in async/await variant of the solution, using the `driver.race(timeout).await`, what happens to the client socket after the `race` signals the timeout error?

Any internal race() values will be `Drop`ed and driver itself will remain (although rust will complain you are not handling the Result if you type it 'as is'), if a new socket was created local to the future it will be cleaned up.

The niceness of futures (in Rust) is that all the behavior around it can be defined, while "all functions are blocking." as you state in a sibling comment, Rust allows you to specify when to defer execution to the next task in the task queue, meaning it will poll tasks arbitrarily quickly with an explicitly held state (the Future struct). This makes it both very fast (compared to threads which need to sleep() in order to defer) and easy to reason about.

Java's Thread.interrupt is also just a sleep loop, which is fine for most applications to be fair. Rust is a system language, you can't have that in embedded systems, and it's not desirable for kernels or low-latency applications.

avodonosov · 2024-03-25T10:45:37 1711363537

> Java's Thread.interrupt is also just a sleep loop

You probably mean that Java's socket reading under the hood may start a non-blocking IO operation on the socket, and then run a loop, which can react on Thread.interrupt() (which, in turn, will basically be setting a flag).

But that's an implementation detail, and it does not need to be implemented that way.

It can be implemented the same way as async/await. When a thread calls socket reading, the runtime system will take the current threads continuation off the execution, and use CPU to execute the next task in the queue. (That's how Java's new virtual threads are implemented).

Threads and async/await are basically the same thing.

So why not drop this special word `async`?

f_devd · 2024-03-25T11:21:42 1711365702

> So why not drop this special word `async`?

You can drop the special word in Rust it's just sugar for 'returns a poll-able function with state'; however threads and async/await are not the same.

You can implement concurrency any way you like, you can run it in separate processes or separate nodes if you are willing to put in the work, that does not mean they equivalent for most purposes.

Threads are almost always implemented preemptively while async is typically cooperative. Threads are heavy/costly in time and memory, while async is almost zero-cost. Threads are handed over to the kernel scheduler, while async is entirely controlled by the program('s executor).

Purely from a merit perspective threads are simply a different trade-off. Just like multi-processing and distributed actor model is.

gpderetta · 2024-03-25T12:32:06 1711369926

> Threads are almost always implemented preemptively while async is typically cooperative. Threads are heavy/costly in time and memory, while async is almost zero-cost. Threads are handed over to the kernel scheduler, while async is entirely controlled by the program('s executor).

Keyword here being almost. See Project Loom.

avodonosov · 2024-03-25T15:12:56 1711379576

@f_devd, cooperative vs preemptive is a good point.

(That threads are heavy or should be scheduled by OS is not required by the nature of the threads).

But preemptive is strictly better (safer at least) than cooperative, right? Otherwise, one accidental endless loop, and this code occupies the executor, depriving all other futures from execution.

@gpderetta, I think Project Loom will need to become preemptive, otherwise the virtual threads can not be used as a drop-in replacement for native threads - we will have deadlocks in virtual threads where they don't happen in native threads.

f_devd · 2024-03-25T15:43:15 1711381395

Preemptive is safer for liveliness since it avoids 'starvation' (one task's poll taking too long), however it in practice almost always more expensive in memory and time due to the implicit state.

In async, only the values required to do a poll need to be held (often only references), while for threads the entire stack & registers needs to be stored at all times, since at any moment it could be interrupted and it will need to know where to continue from. And since it needs to save/overwrite all registers at each context switch (+ scheduler/kernel handling), it takes more time overall.

In general threads are a good option if you can afford the overhead, but assuming threads as a default can significantly hinder performance (or make near impossible to even run) where Rust needs to.

avodonosov · 2024-03-27T00:17:42 1711498662

@f_devd, I think you are mistaken.

Not that I want to discourage anyone from using async/await. I am glad async/await solves people problems, especially when people do not have a ready to use alternative as my perfect ideal threads.

But just to reduce the number of people who are mistaken in the Internet :)

I think the only real problem that makes threads really expensive for embedded systems is statically allocated large stack. If stack size is managed dynamically, it can be small thus allowing many threads. The other expenses should be tolerable. Embedded systems don't require high computational throughput, I think.

All implementation approaches used for async/await can be used for threads, and vice versa, because they are basically the same thing.

> In async, only the values required to do a poll need to be held (often only references), while for threads the entire stack & registers needs to be stored at all times, since at any moment it could be interrupted and it will need to know where to continue from.

Well, it seems opposite - the approach you attribute to threads can be more efficient here. If async function, when blocked, holds in its Feature state record only the part of local vars and parameters that is needed to continue execution, the function needs to copy them from the stack. And that's redundant copying and memory allocation for Feature state records. Note, this happens at every element of function call chain, so the Future state records act as stack frames. And this stack copying is most likely done in individual assignments, var by var.

And I am afraid this allocation and copying can happen every time the async function blocks. Reusing Future state records may be non-trivial, given that next time the top-level async function we are await'ing for may block in some other internal branch.

Compared to saving the stack which is just saving two registers: stack base and stack pointer.

> And since it needs to save/overwrite all registers at each context switch (+ scheduler/kernel handling), it takes more time overall.

Saving registers is cheap. Also there is no magic, when next async function is activated by async function scheduler, it uses the registers as it wants, so register values of previously blocked async function need to be saved somehow - this happens when the most nested function copies it local vars to the Future state record.

Speaking of preemption requiring kernel - not necessarily. It can be done in user space. A thread can yield control to scheduler when it invokes a blocking function (as Java virtual threads currently do). In addition to that, other preemption points can be used - function calls, allocations, maybe loop boundaries. This approach lies in between the cooperative threading and full preemption.

If we consider preemption by timer interrupts. First, it only happens if the thread haven't yet yielded control by calling a blocking function. Second, if preemption by timer happens, kernel can pass control to the user space scheduler in the application runtime instead of applying kernel's heavy weight scheduler (is kernel scheduler really more heavy weight?).

Moreover, I've just searched for user space interrupts, and it looks like new processors provide such a feature. The first link in search currently is https://lwn.net/Articles/871113/. Green threads scheduling is mentioned as one of the use cases.

So, in short, I don't see why threads would be inherently less performant than async/await.

f_devd · 2024-03-28T10:13:48 1711620828

I think you might be confusing Runtime, OS and bare-metal primitives. Java virtual threads are possible because there is always the runtime which code will return to, and since it's already executing in a VM the concept of Stack/Heap Store/Loads don't really matter for performance.

> Compared to saving the stack which is just saving two registers: stack base and stack pointer.

In embedded you might not have a stack base, just a stack pointer, this means in order to switch to a different stack you need to copy 2 stacks. (I might be wrong here; I know some processors have linear stacks, but this might be more uncommon).

On bare metal this dynamic changes significantly, in order to "switch contexts" with preemption the following steps are needed (omitting the kernel switch ops):

- Receive interrupt

- Mask interrupts

- Store registers to heap

- Store stack to heap

- Calculate next preemption time (scheduler)

- Set interrupt for next preemption time

- Load stack from heap

- Load registers from heap

- Unmask interrupts

- Continue execution using program counter

While for async/await everything already in place on the stack/heap so a context switch is:

- Call Future.poll function

- If Poll::Ready, make parent task new Future and (if it exists) call it

- If Poll::Pending, go to next Future in Waker queue

Async/await (in rust) is without a runtime, and without copies or register stores/loads; it can be implemented on any cpu. On embedded tasks can also decide how they want to be woke, so if you want to do low-power operation you can make an interrupt which calls `wake(future)` and it will only poll that task after the interrupt has hit, meaning any time the Waker queue is empty it knows it can sleep with interrupts enabled.

> so register values of previously blocked async function need to be saved somehow

The difference is that we know exactly which values are needed instead of not knowing what we need from the stack/registers.

User-space interrupts would make it easier to do preemption in user-space but this is yet another feature you can't make assumptions about (especially since there has been only a single gen of processors which support it).

gpderetta · 2024-03-28T12:31:31 1711629091

Yes, of course a non-cooperative switch is more expensive than a cooperative one. But the thread model does not require preemption or even time-slice scheduling.

But with async/await cooperative switch is the only option.

f_devd · 2024-03-28T13:54:27 1711634067

I'm unfamiliar with a bare-metal thread model that doesn't do preemption outside of a Runtime. I imagine you'd need to effectively inject code to do a cooperative switch as there aren't many ways for a cpu to exit it's current 'task' outside of an interrupt (pre-emption) or a defer call (cooroutines/async). For Runtimes it usually also means you effectively have a cooperative switch but it's hidden away in runtime code.

Do you have an example?

avodonosov · 2024-04-02T02:01:54 1712023314

@f_devd, I realized that my main objection to async/await does not apply to Rust.

Thank you for staying in the discussion long enough for me to realize that completely.

I dislike async/await in Javascript because async functions can not be called synchronously from normal functions. The calling function and all its callers and all their callers need to turned async.

In Rust, since we can simply do `executor::block_on(some_async_functino())`, my objection goes away - all primitives remain fully composable. Async functions can call usual functions and vice versa.

So my first comment was to some extend a "knee-jerk reaction".

As we started to discuss thread preemption cost, I will provide some responses below. In short, I believe it can be on par with async/await.

=================================================

> I think you might be confusing Runtime, OS and bare-metal primitives.

I am not confusing, but I consider all those cases down to what happens at CPU level.

> Java virtual threads are possible because there is always the runtime which code will return to, and since it's already executing in a VM the concept of Stack/Heap Store/Loads don't really matter for performance.

They remain applicable, as at the lowest level the VM / Runtime is executed by a CPU.

> Async/await (in rust) is without a runtime,

Rust Executor is a kind of runtime, IMHO.

> and without copies or register stores/loads;

The CPU register values are still saved to memory when async function returns Poll::Pending, so that the intermediate computation results are not lost and when polled again the function continues its execution correctly. (On the level of Rust source code, the register saving corresponds to assignment of local variables of the most nested async function to the fields of the generated anonymous future).

==============================================

> In embedded you might not have a stack base, just a stack pointer, this means in order to switch to a different stack you need to copy 2 stacks. (I might be wrong here; I know some processors have linear stacks, but this might be more uncommon).

If the CPU does not have a stack base (stack segment register), saving of the stack pointer is enough to switch to another stack.

In practice, I think, even CPUs with stack segment register, most often only need to save stack pointer for context switch - all stacks of the process can live in the same segment, and even for different processes the OS can arrange the segments to have the equal segment selector. I know that switching to kernel mode usually involves changing stack segment register in addition to the stack pointer (as the kernel stack segment has different protection level).

==============================================

> On bare metal this dynamic changes significantly, in order to "switch contexts" with preemption the following steps are needed (omitting the kernel switch ops): [...] While for async/await everything already in place on the stack/heap so a context switch is: [..]

The operations you listed for bare metal are very cheap, some items in the list are just single CPU instruction. (Also, I think timer interrupts are configured once for periodic interval and don't need to be recalculated and set on every context switch).

If one expands the "go to next Future in Waker queue" item you listed for async/await in the same level of detail that you did for bare metal, the resulting list may be even longer than the bare metal list.

==============================================

The majority of the context switch cost at CPU level is when we switch to different process, so that new virtual memory mapping table needs to be loaded to the CPU, (and correspondingly, the cached mappings in TLB needs to be reset and new ones need to be computed during execution in the new context); from the need to load different descriptor tables.

Nothing of that applies to in-process green thread context switches.

Ygg2 · 2024-03-25T12:56:31 1711371391

Java can afford that. M:N threads come with a heavy runtime. Java has already a heavy runtime, so what is a smidgen more flab?

Source: https://github.com/rust-lang/rfcs/blob/master/text/0230-remo...

gpderetta · 2024-03-25T13:14:59 1711372499

So it seems that the biggest issue was having a single Io interface forcing overhead on both green and native threads and forcing runtime dispatching.

It seems to me that the best would have been to have the two libraries evolve separately and capture the common subset in a trait (possibly using dynamic impl when type erasure is tolerable), so that you can write generic code that can work with both or specialized code to take advantage of specific features.

As it stand now, sync and async are effectively separated anyway and it is currently impossible to write generic code that hande both.

littlestymaar · 2024-03-25T08:32:32 1711355552

> But programmers want threads conceptually, semantically. Write sequential logic and don't use strange annotations like "async". In other words, if async/await is so good, why not make all functions in the language implicitly async, and in stead of "await" just use normal function calls? Then you will suddenly be programming in threads.

Some programmers do, but many want exactly the opposite as well. Most of the time I don't care if it's an OS blocking syscall or a non-blocking one, but I do care about understanding the control flow of the program I'm reading and see where there's waiting time and how to make them run concurrently.

In fact, I'd kill to have a blocking/block keyword pair whenever I'm working with blocking functions, because they can surreptitiously slow down everything without you paying attention (I can't count how many pieces of software I've seen with blocking syscalls in the UI thread, leading to frustratingly slow apps!).

mike_hearn · 2024-03-25T10:10:15 1711361415

This is a really common comment to see on HN threads about async/await vs fibers/virtual threads.

What you're asking for is performance to be represented statically in the type system. "Blocking" is not a useful concept for this. As avodonosov is pointing out, nothing stops a syscall being incredibly fast and for a regular function that doesn't talk to the kernel at all being incredibly slow. The former won't matter for UI responsiveness, the latter will.

This isn't a theoretical concern. Historically a slow class of functions involved reading/writing to the file system, but in some cases now you'll find that doing so is basically free and you'll struggle to keep the storage device saturated without a lot of work on multi-threading. Fast NVMe SSDs like found in enterprise storage products or MacBooks are a good example of this.

There are no languages that reify performance in the type system, partly because it would mean that optimizing a function might break the callers, which doesn't make sense, and partly because the performance of a function can vary wildly depending on the parameters it's given.

Async/await is therefore basically a giant hack and confuses people about performance. It also makes maintenance difficult. If you start with a function that isn't async and suddenly realize that you need to read something from disk during its execution, even if it only happens sometimes or when the user passes in a specific option, you are forced by the runtime to mark the function async which is then viral throughout the codebase, making it a breaking change - and for what? The performance impact of reading the file could easily be lost in the noise on modern hardware and operating systems.

The right way to handle this is the Java approach (by pron, who is posting in this thread). You give the developer threads and make it cheap to have lots of them. Now break down tasks into these cheap threads and let the runtime/OS figure out if it's profitable to release the thread stack or not. They're the best placed to do it because it's a totally dynamic decision that can vary on a case-by-case basis.

Nullabillity · 2024-03-25T12:49:27 1711370967

> It also makes maintenance difficult. If you start with a function that isn't async and suddenly realize that you need to read something from disk during its execution, even if it only happens sometimes or when the user passes in a specific option, you are forced by the runtime to mark the function async which is then viral throughout the codebase, making it a breaking change - and for what? The performance impact of reading the file could easily be lost in the noise on modern hardware and operating systems.

You'll typically have an idea of whether or not a function performs IO from the start. Changing that after the fact violates the users' conceptual model and expectation of it, even if all existing code happens to keep working.

jerf · 2024-03-25T13:27:44 1711373264

If you want to go full Haskell on the problem for purity-related reasons, by all means be my guest. I strongly approve.

However, unless you're in such a language, warping my entire architecture around that objection does not provide a good cost-benefit tradeoff. I've got a lot of fish to fry and a lot of them are bigger than this in practice. Heck, there's still plenty of programmers who will consider it as unambiguous feature that they can add IO to anything they want or need to and consider it a huge negative when they can't, and a lot of programmers who don't practice an IO isolation and don't even conceive of "this function is guaranteed to not do any IO/be impure" as a property a function can have.

mike_hearn · 2024-03-26T09:21:50 1711444910

In any system that uses mmap or swap it's a meaningless distinction anyway (which is obviously nearly all of them outside of embedded RTOS). Accessing even something like the stack can trigger implicit/automatic IO of arbitrary complexity, so the concept of a function that doesn't do IO is meaningless to begin with. Async/await isn't justified by any kind of interesting type theory, it exists to work around limitations in runtimes and language designs.

littlestymaar · 2024-03-26T15:30:04 1711467004

This argument is dull, nothing in programming can do anything perfectly: is catching exception is useless because the program can be killed by the OS? Is static typing pointless because cosmic rays can make your data ill-formed anyway?

All abstractions are leaky, and the OS and hardware's behaviors are always going to surface in ways that you cannot model in your programming language, no matter how low-level you want to go (asm itself is a poor abstraction on top of how the CPU actually works), but that doesn't make them useless.

Async/await is a way to communicate intent between developers on the same project, and between a dependency and its dependent. Exactly like static types, error as return values and non-nullable types. And like all of them while it doesn't prevent all bugs, it definitely helps.

The fact that it makes straightforward to implement the best possible performance is just the cherry on top.

spinningslate · 2024-03-25T13:36:46 1711373806

> You'll typically have an idea of whether or not a function performs IO from the start.

I think GP's point is: why does that matter? Much writing on Async/Await roughly correlates IO with "slow". GP rightly points out that "slow" is imprecise, changes, means different things to different people and/or use cases.

I completely get the intuition: "there's lag in the [UI|server|...], what's slowing it down?". But the reality is that trying to formalise "slow" in the type system is nigh on impossible - because "slow" for one use case is perfectly acceptable for another.

littlestymaar · 2024-03-25T13:51:51 1711374711

While slow in absolute depends on lots of factors, the relative slowness of things doesn't so much. Whatever the app or the device, accessing a register is always going to be faster than random places in RAM, which is always going to be faster than fetching something on disk and even moreso if we talk about fetching stuff over the network. No matter how hardware progresses, latency hierarchy is doomed to stay.

That doesn't mean it's the only factor of slowness, and that async/await solves all issues, but it's a tool that helps, a lot, to fight against very common sources of performance bugs (like how the borrow checker is useful when it protects against the nastiest class of memory vulnerabilities, even if it cannot solve all security issues).

Because the situation where “my program is stupidly waiting for some IO even though I don't even need the result right now and I could do something in the meantime” is something that happens a lot.

cesarb · 2024-03-25T14:42:12 1711377732

> Whatever the app or the device, accessing a register is always going to be faster than random places in RAM, which is always going to be faster than fetching something on disk and even moreso if we talk about fetching stuff over the network.

The network is special: the time it takes to fetch something over the network can be arbitrarily large, or even infinite (this can also apply to disk when running over networked filesystems), while for registers/RAM/disk (as long as it's a local disk which is not failing) the time it takes is bounded. That's the reason why async/await is so popular when dealing with the network.

Nullabillity · 2024-03-25T17:56:22 1711389382

PCIe is a network. USB is a network. There is no such thing as a resource with a guaranteed response time.

Nullabillity · 2024-03-25T14:20:57 1711376457

Even if you ignore performance completely, IO is unreliable. IO is unpredictable. IO should be scrutinized.

ngrilly · 2024-03-26T12:05:30 1711454730

When using SSD or eNVM, why would local IO be more unreliable/unpredictable than local RAM?

riskable · 2024-03-27T15:36:42 1711553802

Don't assume your process is the only process. You have to share resources like storage and RAM with everything else on the system. Just a single, simple Java app can gobble up all available RAM if you don't tell it otherwise.

ngrilly · 2024-03-27T15:40:44 1711554044

Exactly. Which means other processes running on the same server can cause latency on disk access but also on RAM, which was the point I was trying to make :)

mike_hearn · 2024-03-25T13:13:31 1711372411

There are plenty of language ecosystems where there's no particular expectations up front about whether or when a library will do IO. Consider any library that introduces some sort of config file or registry keys, or an OS where a function that was once purely in-process is extracted to a sandboxed extra process.

Nullabillity · 2024-03-25T13:21:39 1711372899

> There are plenty of language ecosystems where there's no particular expectations up front about whether or when a library will do IO.

There are languages that don't enforce the expectation on a type level, but that doesn't mean that people don't have expectations.

> Consider any library that introduces some sort of config file or registry keys

Yeah, please don't do this behind my back. Load during init, and ask for permission first (by making me call something like Config::load() if I want to respect it).

> or an OS where a function that was once purely in-process is extracted to a sandboxed extra process.

Slightly more reasonable, but this still introduces a lot of considerations that the application developer needs to be aware of (how should the library find its helper binary? what if the sandboxing mechanism fails or isn't available?).

mike_hearn · 2024-03-25T17:54:11 1711389251

For the sandbox example I was thinking of desktop operating systems where things like file IO can become brokered without apps being aware of it. So the API doesn't change, but the implementation introduces IPC where previously there wasn't any. In practice it works fine.

cryptonector · 2024-03-25T21:35:20 1711402520

> There are no languages that reify performance in the type system,

Async/await is a way of partially doing... just that, but without a having to indicate what is "blocking", and if an async function blocks, well, you'll be unhappy, so don't do that. For a great deal of things this is plenty good enough, but from a computer science perspective it's deeply unsatisfying because one would want the type system to prevent making such mistakes.

At least with async/await an executor could start more threads when an async thread makes a known-blocking call, thus putting a band-aid on the problem.

Perhaps the compiler could reason about the complexity of code (e.g., recursion and nested loops w/ large or unclear bounds -> accidentally quadratic -> slow -> "blocking") and decide if a pure function is "blocking" by dint of being slow. File I/O libraries could check if the underlying devices are local and fast vs. remote and slow, and then file I/O could always be async but completing before returning when the I/O is thought to be fast. This all feels likely to cause more problems than it solves.

If green threads turn out not to be good enough then it's easier to accept the async/await compromise and reason correctly about blocking vs. not.

mike_hearn · 2024-03-27T09:18:35 1711531115

A function being marked async tells you nothing in particular about the performance of that operation. It could be anything from seconds to milliseconds and routinely is e.g. elliptic curve cryptography operations that are plenty fast enough to execute on a UI thread animating at 60fps are nonetheless marked async on the web, whilst attaching a giant document fragment to the live DOM - which might trigger very intensive rerendering calculations - isn't.

Nothing stops you from starting more threads when you run low in a scenario when there are only threads also. That's how the JVM ForkJoinPool works. If your threads end up all blocked, more are started automatically.

littlestymaar · 2024-03-25T10:39:36 1711363176

You can't encode everything about performance in the type system, but that doesn't mean you cannot do it at all: having a type system that allows you to control memory layout and allocation is what makes C++ and Rust faster than most languages. And regarding what you say about storage access: storage bandwidth is now high, but latency when accessing an SSD is still much higher than accessing RAM, and network is even worse. And it will always be the case no matter what progress hardware makes, because of the speed of light.

Saying that async/await doesn't help with all performance issues is like saying Rust doesn't prevent all bugs: the statement is technically correct, but that doesn't make it interesting.

> Async/await is therefore basically a giant hack and confuses people about performance. It also makes maintenance difficult.

Many developers have embraced the async/await model with delight, because it instead makes maintenance easier by making the intent of the code more explicit.

It's been trendy on HN to bash async/await, but you are missing the mist crucial point about software engineering: code is written for humans and is read much more than written. Async/await may be slightly more tedious to write (it's highly context dependent though, when you have concurrent tasks to execute or need cancellation, it becomes much easier with futures).

> The right way to handle this is the Java approach (by pron, who is posting in this thread)

No it's not, and Mr Pressler's has repeatedly shown that he misses the social and communication aspects, so it's not entirely surprising.

ngrilly · 2024-03-26T12:14:09 1711455249

> It's been trendy on HN to bash async/await, but you are missing the mist crucial point about software engineering: code is written for humans and is read much more than written.

Readability can be in the eye of the beholder. Some of us find it easier to read concurrent code using goroutines and channels in Go, or processes and messages in Erlang, than async/await in Rust.

littlestymaar · 2024-03-26T13:43:54 1711460634

I don't believe you. And you know why? Because because what the async syntax permit is just a superset of the capabilities of goroutines: Any goroutine-based code can be rewritten into async/await, without changing anything of the structure: in practice all it would change is that it would add .await at yield points. Yes, async Rust code in fact resembles a lot the code you're used to, with “tasks” (which are functionally goroutines) and channels (just have a look at tokio's documentation[1] to see how it looks like).

But for some situations, in addition to the goroutine style, async/await allows you to have a much more straightforward implementation (cancellation, timeout, error handling, etc.). Even then, there's nothing stopping you from using channels and select for these use-case when using Rust async/await. (In practice people don't because nobody sane would use channels and select to implement a timeout instead of tokio::timeout, but you can).

Saying that goroutine and channel are easier to read than async/await Rust is like saying your car can drive slower than mine, and it just reveals that you haven't actually ever read async rust code and you're just imagining how bad it must be because of some prejudice of yours.

[1]: https://tokio.rs/tokio/tutorial/channels

ngrilly · 2024-03-26T15:07:19 1711465639

You're writing that "code is written for humans" (very true) and that async/await is more readable, and when you receive feedback from other humans they find async/await slightly less readable than fibers/green threads, you dismiss the feedback as unfaithful. That's quite often the problem with arguments about readability: they devolve into personal opinions. Perhaps you're right, perhaps if I'd spend much more time reading async/await code, perhaps I'll eventually find it as readable as fibers code. But that's not the case today. Others in this discussion have shared a similar feedback. To be clear, that readability "issue" is not a big deal for me, just something that I perceive as a small obstacle.

littlestymaar · 2024-03-26T18:42:36 1711478556

> You're writing that "code is written for humans" (very true) and that async/await is more readable, and when you receive feedback from other humans they find async/await slightly less readable than fibers/green threads, you dismiss the feedback as unfaithful.

Yes, because it is. Async await can express exactly the same thing as goroutines without any alteration in structures, so by definition it cannot be “less readable” because the code is exactly the same. But, at the same time, for specific topics where goroutines code is objectively[1] not optimal, async/await offers additional tools.

Literally Anything that is straightforward with goroutines, is going to be straightforward with async/await, because your are just going to write the exact same code (really, go ahead and paste any Go code here and you'll see that the async Rust translation is going to be identical to a thread based translation, with only a small amounts “async” and “await” keywords added here and there).

> That's quite often the problem with arguments about readability: they devolve into personal opinions.

The problem here is that you're having an opinion on something you don't know about, and you're talking purely out of prejudice. And of course it's a big “personal opinion” problem.

> Perhaps you're right, perhaps if I'd spend much more time reading async/await code, perhaps I'll eventually find it as readable as fibers code. But that's not the case today.

That's not my point at all! My point is that FOR PRETTY MUCH EVERYTHING IT'S GOING TO BE EXACTLY THE SAME CODE, just with materialized yield points! And only for some stuff that is hard and tedious to write with goroutines (think cancellation, timeouts), you'd be able to use a different syntax with async/await (that is, it's going to be a function call instead of having to use channels + select).

And that's why async/await can be “easier to read” while not being “harder to read”, simply because it has the expressing power to express the exact same things, and the power to express a few other things in a better way.

[1]: (yes I say “objectively”, because writing 20 lines of concurrent code with channels and select for something that can be a simple function call is arguably inferior).

ngrilly · 2024-03-26T19:43:10 1711482190

> Async await can express exactly the same thing as goroutines without any alteration in structures, so by definition it cannot be “less readable” because the code is exactly the same.

I agree that in many cases the "structure" is exactly the same, but the code is not, otherwise we wouldn't have to add `async` and `await` and adjust types for async.

> with only a small amounts “async” and “await” keywords added here and there

It's not just about the async and await keywords. It's also about adjusting the types, thinking differently about parallelism (not concurrency), and a few other things.

> My point is that FOR PRETTY MUCH EVERYTHING IT'S GOING TO BE EXACTLY THE SAME CODE, just with materialized yield points!

Then that's not the same code. This is syntactically and semantically different. And that's exactly what makes async/await interesting and useful. The screaming case wasn't necessary.

> it has the expressing power

Increased expressive power doesn't always help with increased readability. To continue that discussion in a constructive way, we would need to agree on a definition of readability and how to assess it objectively, because I'm not sure we're talking about the same thing. But I lost interest in doing so considering the general tone of the comments.

littlestymaar · 2024-03-26T20:24:34 1711484674

> It's not just about the async and await keywords. It's also about adjusting the types, thinking differently about parallelism (not concurrency), and a few other things.

No, you're overthinking it. The way concurrency works with async await is exactly the same as the way it works with green thread[1]. It's a bit different from what happens with OS threads but at the same time the difference between OS threads and green threads doesn't seem particularly impactful for you since you are happy to ignore it when talking about goroutines.

> Then that's not the same code. This is syntactically and semantically different. And that's exactly what makes async/await interesting and useful.

It's syntactically a bit different, but not semantically, at least not in a meaningful way: Rust async vs JS async is more semantically different than old Go[2] and rust, and old Go was more different to Go now than it was to Rust. The syntax difference is what makes it the most different, because it allows for terser constructs in a few cases (this is were the readability benefits kick in, but it's limited in scope). But that's basically the same difference as Rust error handling vs Go's (it works functionally the same, but Rust's approach allow for the terser `?` syntax that was added a few years after 1.0).

> The screaming case wasn't necessary.

Sorry about that, it wasn't about screaming and more about working around the lack of bold emphasis in HN comment formatting but I see how it can be misinterpreted as aggressive.

> Increased expressive power doesn't always help with increased readability.

If language A has more expressing power than language B, it means that the same developers can express things in language A the same way he would for language B. That is, his code would be no less readable when written in language A than if it was written in language B. Of course that says nothing about cultural differences between subgroups of developers and how some people in one language could write code that is less readable than other people in another language, but this has nothing to do with the language itself. (Bad developers write unreadable code in any languages and Go is a good testament that limiting the expressing power of one language isn't a good way to improve readability over the board as Go's leadership eventually recognized).

> To continue that discussion in a constructive way, we would need to agree on a definition of readability and how to assess it objectively

That's not something we can easily define formally, but we could work from code samples, like I suggested above.

[1] at least for cooperatively scheduled green threads, which is what go did for almost a decade. If you go back to the time where go had segmented stacks, then it's literally the exact same semantics in every way including memory layout.

[2] by “old Go” I mean Go before they introduced a preemptive scheduler, which happened a few years ago.

avodonosov · 2024-03-25T08:54:14 1711356854

But all functions are blocking.

   fn foo() {bar(1, 2);}
   fn bar(a, b) {return a + b;}

Here bar is a blocking function.

EVa5I7bHFq9mnYK · 2024-03-25T10:27:49 1711362469

Difference is in quantities. bar blocks for nanoseconds, blocking that the GP talks about affects the end user, which means it's in seconds.

littlestymaar · 2024-03-25T09:16:15 1711358175

No they aren't, and that's exactly my point.

Most functions aren't doing any syscall at all, and as such they aren't either blocking or non-blocking.

Now because of path dependency and because we've been using blocking functions like regular functions, we're accustomed to think that blocking is “normal”, but that's actually a source of bugs as I mentioned before. In reality, async functions are more “normal” than regular functions: they don't do anything fancy, they just return a value when you call them, and what they return is a future/promise. In fact you don't even need to use any async anotation for a function to be async in Rust, this is an async function:

    fn toto() -> impl Future<Output = String>> {
        unimplemented!();
    }

The async keyword exists simply so that the compiler knows it has to desugar the await inside the function into a state machine. But since Rust has async blocks it doesn't even need async on functions at all, the information you need comes from the type of the return value, that is a future.

Blocking functions, on the contrary, are utterly bizarre. In fact, you cannot make one yourself, you must either call another blocking function[1] or do a system call on your own using inline assembly. Blocking functions are the anomaly, but many people miss that because they've lived with them long enough to accept them as normal.

[1] because blockingness is contagious, unlike asynchronousness which must be propagated manually, yes ironically people criticizing async/await get this one backward too

anonymoushn · 2024-03-25T10:09:43 1711361383

"makes certain syscalls" is a highly unconventional definition of "blocking" that excludes functions that spin wait until they can pop a message from a queue.

If your upcoming systems language uses a capabilities system to prevent the user from inadvertently doing things that may block for a long time like calling open(2) or accessing any memory that is not statically proven to not cause a page fault, I look forward to using it. I hope that these capabilities are designed so that the resulting code is more composable than Rust code. For example it would be nice to be able to use the Reader trait with implementations that source their bytes in various different ways, just as you cannot in Rust.

littlestymaar · 2024-03-25T11:37:50 1711366670

Blocking syscalls are a well defined and well scoped class of problems, sure there are other situations where the flow stops and a keyword can't save you from everything.

Your reasoning is exactly similar to the folks who say “Rust doesn't solve all bugs” because it “just” solve the memory safety ones.

anonymoushn · 2024-03-25T12:05:48 1711368348

I may be more serious than you think. Having worked on applications in which blocking for multiple seconds on a "non-blocking syscall" or page fault is not okay, I think it would really be nice to be able to statically ensure that doesn't happen.

littlestymaar · 2024-03-25T12:30:59 1711369859

I'm not disputing that, in the general case I suspect this is going to be undecidable, and that you'd need careful design to carve out a subset of the problem that is statically addressable (akin to what rust did for memory safety, by restricting the expressiveness of the safe subset of the languages).

For blocking syscalls alone there's not that much PL research to do though and we could get the improvement practically for free, that's why I consider them to be different problems (also because I suspect they are much more prevalent given how much I've encountered them, but it could be a bias on my side).

cozzyd · 2024-03-25T14:33:58 1711377238

Any function can block if memory it accesses is swapped out.

immibis · 2024-03-25T09:59:17 1711360757

bar blocks waiting for the CPU to add the numbers.

littlestymaar · 2024-03-25T10:18:11 1711361891

Nope it doesn't, in the final binary the bar function doesn't even exist anymore, as the optimizer inlined it, and CPUs have been using pipelining and speculative execution for decades now, they don't block on single instruction. That's the problem with abstractions designed in the 70s, they don't map well with the actual hardware we have 50 years after…

imtringued · 2024-03-25T11:36:57 1711366617

I don't know what to tell you, but that is how sequential code works. Sure you can find some instruction level parallelism in the code and your optimizer may be able to do it across function boundaries, but that is mostly a happy accident. Meanwhile HDLs are the exact opposite. Parallel by default and you have to build sequential execution yourself. What is needed for both HLS and parallel programming is a parallel by default hybrid language that makes it easy to write both sequential and parallel code.

littlestymaar · 2024-03-25T13:25:13 1711373113

Except, unless you're using atomics or volatiles, you have no guaranties that the code you're writing sequentially is going to be executed this way…

gpderetta · 2024-03-25T14:28:39 1711376919

Sure, unless it is the first time you are executing that line of code and you have to wait for the OS to slowly fault it in across a networked filesystem.

diarrhea · 2024-03-25T11:17:44 1711365464

Make `a + b` `A * B` then, multiplication of two potentially huge matrices. Same argument still holds, but now it's blocking (still just performing addition, only an enormous number of times).

littlestymaar · 2024-03-25T11:32:04 1711366324

It's not blocking, it's doing actual work.

Blocking is the way used by the old programming paradigm to deal with asynchronous actions, and it works by behaving the same way as when the computer actually computes thing, so that's where the confusion comes from. but the two situations are conceptually very different: in one case, we are idle (but don't see it), in another case we're busy doing actual work. Maybe in case 2. we could optimize the algorithm so that we spend more time, but that's not sure, whereas in case 1. there's something obvious to do to speed things up: do something at the same time instead of waiting mindlessly. Having a function marked async gives you a pointer that you can actually run it concurrently to something else and expect speed up, whereas with blocking syscall there's no indication in the code that those two functions you're calling next to each other with not data dependency between them would gain a lot to be run concurrently by spawning two threads.

BTW, if you want something that's more akin to blocking, but at a lower level, it's when the CPU has to load data from RAM: it's really blocked doing nothing useful. Unfortunately that's not something you can make explicit in high-level languages (or at least, the design space hasn't been explored) so when these kinds of behavior matters to you, that's when you dive to assembly.

Izkata · 2024-03-25T14:27:30 1711376850

A "non-blocking function" always meant "this function will return before its work is done, and will finish that work in the background through threads/other processes/etc". All other functions are blocking by default, including that simple addition "bar" function above.

littlestymaar · 2024-03-25T14:48:11 1711378091

Your definition is at odds with (for instance) JavaScript's implementation of a non-blocking function though, as it will perform computation until the first await point before returning (unlike Rust future which are lazy, that is: do no work before they are awaited on).

As I said before, most of what you call a “blocking function” is actually a “no opinion function” but since in the idiosyncrasy of most programming languages blocking functions are being called like “no opinion” ones, you are mixing them up. But it's not a fundamental rule. You could imagine a language where blocking functions (which contains an underlying blocking syscall) are being called with the block keyword and where regular functions are just called like functions. There's no relation between regular functions and blocking functions except path dependency that led to this particular idiosyncrasy we live in, it is entirely contingent.

Izkata · 2024-03-25T15:22:26 1711380146

> Your definition is at odds with (for instance) JavaScript's implementation of a non-blocking function though, as it will perform computation until the first await point before returning

Yes, that's syntactic sugar for returning a promise. This pattern is something we've long called a non-blocking function in Javascript. The first part that's not in the promise is for setting it up.