Wow, congrats to the OTP contributors, this looks like a lot of work!
I’ve heard that the BEAM VM byte code is not well documented (maybe in comparison to JVM bytecode), which is one reason that Elixir transpired to Erlang first; is this true?
That is not the reason. It makes sense to compile to Erlang AST representation and leverage all the power (codegen/optimisations/…) of Erlang compiler for free.
To be technically correct, Elixir is compiled into "Erlang Abstract Format, which is a standard representation of an Erlang AST using Erlang terms." [1] From there it compiles down to bytecode using Erlang tool chain.
So saying that Elixir is transpiled to Erlang is incorrect: it compiles to Erlang AST.
Hah, sorry, also posted a similar response in parallel but on parent! I had to google though, I couldn't remember if maybe Elixir used Core Erlang, I'm still a little fuzzy on where that's used and if any important relation to Erlang Abstract Form
> In OTP 22 we have completely re-implemented the lower levels of the Erlang compiler.
Meaning some performance improvements for bit syntax which in Erlang is used [EDIT: in some of the newer, external code; there are still many places where charlists are used] for strings. Some string BIFs got faster. This should translate directly to Elixir and its string handling functions performance.
> OTP 22 comes with a new experimental socket API. The idea behind this API is to have a stable intermediary API that users can use to create features that are not part of the higher-level gen APIs.
Other than the ease of extending the API there are some performance improvements, too.
> PR1952 contributed by Kjell Winblad from Uppsala University makes it possible to do updated in parallel on ets tables of the type ordered_set. This has greatly increased the scalability of such ets tables that are the base for many applications, for instance, pg2 and the default ssl session cache.
More concurrency, better performance.
> In OTP 21.3 the culmination of many optimizations in the ssl application was released. For certain use-cases, the overhead of a using TSL has been significantly reduced. [...] The bytes per second that the Erlang distribution over TSL is able to send has been increased from 17K to about 80K, so more than 4 times as much data as before.
Everything relying on ssl got faster.
> On OTP 22 the logging facility for ssl has been greatly improved and there is now basic server support for TLSv1.3.
Not sure about that, don't know the details of TLS. More/better logging is always welcome.
> In order to deal with the head of line blocking caused by sending very large messages over Erlang Distribution, we have added fragmentation of distribution messages in OTP 22. This means that large messages will now be split up into smaller fragments allowing smaller messages to be sent without being blocked for a long time.
You can send lots of data over to other Erlang nodes without chunking them yourself - other messages will still go through anyway. Great if you have such messages in your app.
> Three new modules, counters, atomics, and persistent_term, were added in OTP 21.2. These modules make it possible for the user to access low-level primitives of the runtime to make some spectacular performance improvements.
> For instance, the cover tool was recently re-written to use counters and persistent_term [...] now it uses counters and the overhead of running cover has decreased by up to 80%.
> A fun (and possibly useful) use case for atomics is to create a shared mutable bit-vector. So, now we can spawn 100 processes and play flip that bit with each other
More tools for performance optimization is always welcome. Shared bit vector is an impressive example, but I'm not sure what else could be done with the modules, I didn't read their docs yet.
> In OTP 21.3, the version when all functions and modules were introduced was added to the documentation.
Great. It's implemented for many other langs' docs and is useful. All the docs were regenerated and the info about the version which introduced them was added to all the functions all over the place.
> In OTP 22 a new documentation top section called Internal Documentation has been added to the erts and compiler applications. The sections contain the internal documentation that previously only has been available on github so that it easier to access.
Better docs on internals is always a good thing. Could potentially save some work for other OTP langs implementers, including Elixir.
> Each major OTP release wouldn’t be complete without a set of memory allocator improvements and OTP 22 is no exception. The ones with the most potential to impact your applications are PR2046 and PR1854. Both of these optimizations should allow systems to better utilize memory carriers in high memory situations allowing your systems to handle more load.
In general, it looks like this release is focused on the optimization of many parts of the platform. Especially string handling improvements are welcome and will affect many Erlang-based (ie. including other OTP languages if I understand correctly) applications out there. That's good. Erlang being "slow" was always its weakest point which came up in nearly every discussion about Erlang. Great to see this is being actively addressed on many fronts. I'll definitely update my apps to use 22.
In Elixir, maybe, but in Erlang many strings are still charlists. Especially important is the fact that the Erlang stdlib’s Leex (lexer) and Yecc (parser) modules operate in terms of charlists, and both the Erlang and Elixir compilers use those modules. So compilation won’t be getting any faster from this optimization.
Only if you use new BEAM features [EDIT: or new compiler optimizations... this could be hard to avoid, so using HiPE may indeed become hard/impossible for awhile... :(] in the module you want to HiPE-compile:
> In OTP 22, HiPE (the native code compiler) is not fully functional. The reasons for this are:
> There are new BEAM instructions for binary matching that the HiPE native code compiler does not support.
> The new optimizations in the Erlang compiler create new combination of instructions that HiPE currently does not handle correctly.
> If erlc is invoked with the +native option, and if any of the new binary matching instructions are used, the compiler will issue a warning and produce a BEAM file without native code.
The relationship with HiPE is weird. It's probably better to consider HiPE as an alternative VM to BEAM which used to be faster in some kinds of computations -- especially numerical -- but had trade-offs in terms of memory consumption and compatibility. Different shops approach HiPE uniquely, but it was not uncommon to see HiPE used for tooling -- dialyzer especially -- but not end up in production. The original developers of HiPE have now moved on entirely or are focused more on introducing JIT capability to BEAM.
The BEAM team are not the maintainers of HiPE and don't have the expertise to update it, meanwhile HiPE does not have a full-time, dedicated team behind it. I won't be surprised if HiPE gets split out from the default install in a few years if HiPE keeps falling behind.
> HiPE and execution of HiPE compiled code only have limited support by the OTP team at Ericsson. The OTP team only does limited maintenance of HiPE and does not actively develop HiPE. HiPE is mainly supported by the HiPE team at Uppsala University.
What's preventing optimisations from being built for maths to be at least as fast as something like Numpy? Is it just the realtime nature of the BEAM makes it difficult?
If you really like you can choose to compile things using HiPE which in my tests can make mathematical stuff about 10X faster... it sounds pretty unsupported these days though. Maybe just put messages on a queue and process them in golang or rust if you must have the performance, no need to let the BEAM have to handle such things directly and always good to measure where your bottlenecks are first.
> What's preventing optimisations from being built for maths to be at least as fast as something like Numpy? Is it just the realtime nature of the BEAM makes it difficult?
You have to take immutability and multi-processing into account. Each process has a separate heap as a design assumption, so if you want to pass data between them, you must copy. Data is immutable as another design assumption, and this means that if you want to change something in a bigger data structure, you also must copy.
Theoretically, the compiler is free to optimize the latter - if it notices that some code can be safely optimized in a mutating way, it is free to do that, but then, someone needs to write that optimization - and the fact that this is not a priority for BEAM means that it is unlikely that someone will spend time to implement it if there are more pressing matters for them to do so. Then, not only it must be written, it must be proven that this optimization does not break anything.
> so if you want to pass data between them, you must copy
Big binaries aren’t copied; they’re passed by reference to a shared ref-counted heap (where each process holds one reference to the binary on the shared heap, and the process’s heap can hold N references to the far reference pointer.)
In theory, more types could be made to do that.
In fact, I believe that handles passed back from NIFs actually reuse this infrastructure, presenting themselves as binaries such that sending the handle to another process is really just creating a new smart pointer in the new process-heap, holding the same raw pointer that the original NIF handle held.
And you can do a lot with that. Erlang’s new-ish `counters` module, where you get a handle to a mutable uint64 array, is essentially just a built in NIF that passes around a mutable NIF handle pointing to a uint64[] buffer, exactly as above.
There’s nothing stopping the Erlang runtime (or your own NIF code) from adding all sorts of other operations against native, mutable types, and exposing them as these sorts of abstract data structure handles. You could, for example, have a `matrix` module for doing matrix math; or a `data_frame` module that can participate in every operation a Pandas.DataFrame can (with all such operations implemented as native code); or a module exposing SHM buffers; or even a module exposing GPU driver primitives (e.g. vertex/shader/texture buffers).
Everything that the bytecode would do to such ADTs would be slow-ish, but the point would be to load stuff into them in your glue code, and then in your hot loop, just bump already-loaded ADTs together in fast, native ways.
See the sibling comments, passing binary refs that refer to Matrex blobs is pretty much what the Matrex library is doing. The refs returned are in a binary format (not sure exactly if it's just the C memory representation or based on Matlab's `.mat` format). It'd be possible to say utilize Elixir macros to build up an AST for common math formulas that could be passed to the native maths library for example which could be pretty fast.
Given that NumPy is implemented in C and only with bindings to Python, you can do exactly the same on the BEAM and indeed it exists (though not widely used): https://github.com/versilov/matrex
I'm using Matrex on some embedded devices, it performs plenty well enough for the task. The (micro) benchmarks for Matrex are intriguing. They appear to show Matrex being faster than NumPY for common matrix tasks. It's not improbable considering BEAM in general appears faster than CPython and erlang nif FFI is simpler than CPYTHON'S which may translate to quicker FFI calls. Of course that's mostly speculative, but at least Elixir/Erlang can be on par with Python in matrix maths.
> Theoretically, the compiler is free to optimize the latter - if it notices that some code can be safely optimized in a mutating way, it is free to do that, but then, someone needs to write that optimization - and the fact that this is not a priority for BEAM means that it is unlikely that someone will spend time to implement it if there are more pressing matters for them to do so.
I have no idea how they prioritize the new features to be built but IMHO safely optimizing code in mutating way is the reason why Haskell is so fast despite the heavy abstraction primitives it offers.
Also, what is the fun in writing perf code in another (native) language if host language is (or can be) capable by itself ?
What's the uptime, reliability and distribution story for Haskell? If I want to have three nodes sharing common state (either eventually consistent or using a consistency model like raft) how do I implement this with (X) Nines of uptime in Haskell? What's the process restart story for Haskell?
the point is that the data structures in erlang are chosen with abstractions necessary to support its features. "safe" for haskell is not the same as "safe" for erlang, and that why it's not easy to implement some of the things that are trivial in erlang, in haskell. Conversely, there are some things that are really safe in haskell (type safety and preventing process crashes) that you kind of have to add boilerplate to get confidence in erlang (or you just 'let it crash', which is a perfectly fine philosophy too). This is not saying that erlang is necessarily better, most of those problems have been "solved by kubernetes" anyways.
To put it simply, arrays are not an erlang data structure. Everything is a linked list, so you can have a high performance copy-on-write data structure that can be safely passed between erlang processes (I think)
We use the BEAM with native language NIFs to do numeric computation on Nvidia GPUs with CUDA/C++. Works great. JVM wouldn't help us much (and would be much more unstable due to JVMs inherent problems with memory leaks and performance.)
They have a major release every year or so, the last one was June of last year. They will release features in point releases though, for example Counters was released in 21.1 and Persistent Term was released with 21.2.
Is there a "definitive" source for explaining the differences -- philosophical, architectural, performance -- between BEAM, JVM, V8, and other virtual machines targeted by programming languages? When I attempt to explain what makes BEAM interesting I default to comparing it to the JVM to explain "similar but different" analogies but I don't recall all of the sources from which I learned those analogies.
One of the other analogies is to containers: processes in BEAM are as isolated from each other as applications running on different machines. They can pass messages to each other but they absolutely cannot access each other's memory... this was explained to me when I asked about running Elixir apps in Docker containers and the "Erlang old-timer" laughed and explained why it wasn't necessary.