Erlang/Elixir/BEAM emphasizes share nothing, allows (encourages) a bezillion user-space processes, then executes with a thread-per-core (by default).
The actual number of schedulers (real threads) is configurable as a command line option, but it's rare, approaching unheard-of, to override the default.
If Go with its green threads is a step down from Rust in performance, Erlang is two or three steps down from Go. If you step down your performance needs, a lot of these problems melt away.
Most programmers should indeed do that. There's no need to bring these problems on yourself if you don't actually need them. Personally I harbor a deep suspicion a non-trivial amount of the stress in the Rust ecosystem over async and its details is coming from people who don't actually need the performance they are sacrificing for. (Obviously, there absolutely people who do need that performance and I am 100% not talking about them.) But it's hard to tell, because they don't exactly admit that's what they're doing if you ask, or, at least, not until many years and grey hairs later.
But in the meantime, some language needs to actually solve these problems (better than C++), and since Rust has volunteered for that role, that means that at the limit, the fact that other languages that chose to just take a performance hit don't seem to have these problems doesn't have very many applicable lessons for Rust, at least when it is being used at these maximum performance levels.
Agree, Erlang will never win any performance benchmarks, but that is mostly due to other aspects of the language: big integers, string handling, safer-rather-than-faster floating point, etc.
[Elixir is a little better, supporting binary-first for strings, rather than charlists - Erlang is very good at pattern-matching binaries.]
Share-nothing and thread-per-core are good for many reasons, including performance, but they also feed into the main philosophies for Erlang development: resilience, horizontal scalability and comprehensibility.
As Joe Armstrong said:
“Make it work, then make it beautiful, then if you really, really have to, make it fast.
90% of the time, if you make it beautiful, it will already be fast.
There's nothing inherently slow about the way you structure a program in Erlang. Most of the problems come from copying values around when sending them across processes.
Erlang/BEAM is significantly slower than either Go or Rust. Its speed reputation was often misunderstood; it was very good at juggling green threads, but it was never a fast programming language. Now that its skill at juggling green threads is commoditized, what's left is the "not very fast programming language".
It's not the slowest language either; it has a decent performance advantage over most of the dynamic scripting languages. But it is quite distinctly slower than Go, let alone Rust.
Erlang (BEAM) has schedulers that execute the outstanding tasks (reductions) on the bezillion user-space (green thread) processes.
For most of Erlang's history, there was a single scheduler per node, so one thread on a physical machine to run all the processes. There is a fixed number of reductions for each (green thread) process, then a context switch to a different process. Repeat.
A few years ago (2008), the schedulers were parallelized, so that multiple schedulers could cooperate on multi-core machines. The number of schedulers and (hw/thread) cores are independent - you can choose any number of real threads to run the schedulers on any physical machine. But, by default, and in practice, the number of schedulers is configured to be one thread-per-core, where core means hardware supported thread (e.g. often Intel chips have 2 hardware threads for each physical core).
So yes, almost always and almost everywhere, there really is one OS thread per hardware supported thread (usually 1x or 2x physical CPU cores) to run the schedulers.
As the original article noted, one of the biggest problems of "thread per core" is the name of it, because it confuses people. It does not mean "one thread per one core" in the literal sense of the word, but rather a specific kind of architecture in which message passing is NOT done between threads (as is very common in Erlang), or it is kept to the minimum possible. Instead, the processing for a single request happens, from the beginning to the end, on one single core.
This is done in order to minimize the need to transfer L1 caches between threads, and to keep each thread's cache pool tied to one request, and not much else (at least, to the extent possible).
In the context of Rust async runtimes, this is very similar to Tokio if work-stealing did not exist, and all futures spawned tasks only on their local thread, in order to make coding easier (lack of Sync + Send + 'static constraints), while also making code more performant (which the article argues it does not).
For examples of thread-per-core runtimes, see glommio and monoio.
I am extremely familiar with Erlang and its history. You are misunderstanding what "Thread Per Core" means.
Again, the fact that data moves across threads in Erlang means it is not TPC - period. Erlang is basically the exact opposite of a TPC system, it is practically its opposite because it is all about sharing data across actors, which can be on any thread.
The actual number of schedulers (real threads) is configurable as a command line option, but it's rare, approaching unheard-of, to override the default.
Virding's First Rule of Programming ...
https://rvirding.blogspot.com/2008/01/virdings-first-rule-of...