Erlang/Elixir/BEAM emphasizes *share nothing*, allows (encourages) a bezillion u...

jerf · on Oct 6, 2023

If Go with its green threads is a step down from Rust in performance, Erlang is two or three steps down from Go. If you step down your performance needs, a lot of these problems melt away.

Most programmers should indeed do that. There's no need to bring these problems on yourself if you don't actually need them. Personally I harbor a deep suspicion a non-trivial amount of the stress in the Rust ecosystem over async and its details is coming from people who don't actually need the performance they are sacrificing for. (Obviously, there absolutely people who do need that performance and I am 100% not talking about them.) But it's hard to tell, because they don't exactly admit that's what they're doing if you ask, or, at least, not until many years and grey hairs later.

But in the meantime, some language needs to actually solve these problems (better than C++), and since Rust has volunteered for that role, that means that at the limit, the fact that other languages that chose to just take a performance hit don't seem to have these problems doesn't have very many applicable lessons for Rust, at least when it is being used at these maximum performance levels.

mikhailfranco · on Oct 6, 2023

Agree, Erlang will never win any performance benchmarks, but that is mostly due to other aspects of the language: big integers, string handling, safer-rather-than-faster floating point, etc.

[Elixir is a little better, supporting binary-first for strings, rather than charlists - Erlang is very good at pattern-matching binaries.]

Share-nothing and thread-per-core are good for many reasons, including performance, but they also feed into the main philosophies for Erlang development: resilience, horizontal scalability and comprehensibility.

As Joe Armstrong said:

“Make it work, then make it beautiful, then if you really, really have to, make it fast.

90% of the time, if you make it beautiful, it will already be fast.

So really, just make it beautiful.”

ergl · on Oct 6, 2023

There's nothing inherently slow about the way you structure a program in Erlang. Most of the problems come from copying values around when sending them across processes.

jerf · on Oct 6, 2023

Erlang/BEAM is significantly slower than either Go or Rust. Its speed reputation was often misunderstood; it was very good at juggling green threads, but it was never a fast programming language. Now that its skill at juggling green threads is commoditized, what's left is the "not very fast programming language".

It's not the slowest language either; it has a decent performance advantage over most of the dynamic scripting languages. But it is quite distinctly slower than Go, let alone Rust.

insanitybit · on Oct 7, 2023

I think you're misunderstanding.

1. Erlang makes sharing data easy, it does so through mailboxes. Data moves across actors constantly.

2. Erlang is not "Thread per Core". I think you're misunderstanding TPC as "there is one OS thread per CPU Core" and that is not the case.

Erlang is basically the exact opposite of the TPC design.

mikhailfranco · on Oct 8, 2023

Erlang (BEAM) has schedulers that execute the outstanding tasks (reductions) on the bezillion user-space (green thread) processes.

For most of Erlang's history, there was a single scheduler per node, so one thread on a physical machine to run all the processes. There is a fixed number of reductions for each (green thread) process, then a context switch to a different process. Repeat.

A few years ago (2008), the schedulers were parallelized, so that multiple schedulers could cooperate on multi-core machines. The number of schedulers and (hw/thread) cores are independent - you can choose any number of real threads to run the schedulers on any physical machine. But, by default, and in practice, the number of schedulers is configured to be one thread-per-core, where core means hardware supported thread (e.g. often Intel chips have 2 hardware threads for each physical core).

So yes, almost always and almost everywhere, there really is one OS thread per hardware supported thread (usually 1x or 2x physical CPU cores) to run the schedulers.

https://www.erlang.org/doc/man/erl.html

https://erlang.org/pipermail/erlang-questions/2008-September...

whytevuhuni · on Oct 8, 2023

As the original article noted, one of the biggest problems of "thread per core" is the name of it, because it confuses people. It does not mean "one thread per one core" in the literal sense of the word, but rather a specific kind of architecture in which message passing is NOT done between threads (as is very common in Erlang), or it is kept to the minimum possible. Instead, the processing for a single request happens, from the beginning to the end, on one single core.

This is done in order to minimize the need to transfer L1 caches between threads, and to keep each thread's cache pool tied to one request, and not much else (at least, to the extent possible).

In the context of Rust async runtimes, this is very similar to Tokio if work-stealing did not exist, and all futures spawned tasks only on their local thread, in order to make coding easier (lack of Sync + Send + 'static constraints), while also making code more performant (which the article argues it does not).

For examples of thread-per-core runtimes, see glommio and monoio.

insanitybit · on Oct 8, 2023

I am extremely familiar with Erlang and its history. You are misunderstanding what "Thread Per Core" means.

Again, the fact that data moves across threads in Erlang means it is not TPC - period. Erlang is basically the exact opposite of a TPC system, it is practically its opposite because it is all about sharing data across actors, which can be on any thread.