What's preventing optimisations from being built for maths to be at least as fast as something like Numpy? Is it just the realtime nature of the BEAM makes it difficult?
If you really like you can choose to compile things using HiPE which in my tests can make mathematical stuff about 10X faster... it sounds pretty unsupported these days though. Maybe just put messages on a queue and process them in golang or rust if you must have the performance, no need to let the BEAM have to handle such things directly and always good to measure where your bottlenecks are first.
> What's preventing optimisations from being built for maths to be at least as fast as something like Numpy? Is it just the realtime nature of the BEAM makes it difficult?
You have to take immutability and multi-processing into account. Each process has a separate heap as a design assumption, so if you want to pass data between them, you must copy. Data is immutable as another design assumption, and this means that if you want to change something in a bigger data structure, you also must copy.
Theoretically, the compiler is free to optimize the latter - if it notices that some code can be safely optimized in a mutating way, it is free to do that, but then, someone needs to write that optimization - and the fact that this is not a priority for BEAM means that it is unlikely that someone will spend time to implement it if there are more pressing matters for them to do so. Then, not only it must be written, it must be proven that this optimization does not break anything.
> so if you want to pass data between them, you must copy
Big binaries aren’t copied; they’re passed by reference to a shared ref-counted heap (where each process holds one reference to the binary on the shared heap, and the process’s heap can hold N references to the far reference pointer.)
In theory, more types could be made to do that.
In fact, I believe that handles passed back from NIFs actually reuse this infrastructure, presenting themselves as binaries such that sending the handle to another process is really just creating a new smart pointer in the new process-heap, holding the same raw pointer that the original NIF handle held.
And you can do a lot with that. Erlang’s new-ish `counters` module, where you get a handle to a mutable uint64 array, is essentially just a built in NIF that passes around a mutable NIF handle pointing to a uint64[] buffer, exactly as above.
There’s nothing stopping the Erlang runtime (or your own NIF code) from adding all sorts of other operations against native, mutable types, and exposing them as these sorts of abstract data structure handles. You could, for example, have a `matrix` module for doing matrix math; or a `data_frame` module that can participate in every operation a Pandas.DataFrame can (with all such operations implemented as native code); or a module exposing SHM buffers; or even a module exposing GPU driver primitives (e.g. vertex/shader/texture buffers).
Everything that the bytecode would do to such ADTs would be slow-ish, but the point would be to load stuff into them in your glue code, and then in your hot loop, just bump already-loaded ADTs together in fast, native ways.
See the sibling comments, passing binary refs that refer to Matrex blobs is pretty much what the Matrex library is doing. The refs returned are in a binary format (not sure exactly if it's just the C memory representation or based on Matlab's `.mat` format). It'd be possible to say utilize Elixir macros to build up an AST for common math formulas that could be passed to the native maths library for example which could be pretty fast.
Given that NumPy is implemented in C and only with bindings to Python, you can do exactly the same on the BEAM and indeed it exists (though not widely used): https://github.com/versilov/matrex
I'm using Matrex on some embedded devices, it performs plenty well enough for the task. The (micro) benchmarks for Matrex are intriguing. They appear to show Matrex being faster than NumPY for common matrix tasks. It's not improbable considering BEAM in general appears faster than CPython and erlang nif FFI is simpler than CPYTHON'S which may translate to quicker FFI calls. Of course that's mostly speculative, but at least Elixir/Erlang can be on par with Python in matrix maths.
> Theoretically, the compiler is free to optimize the latter - if it notices that some code can be safely optimized in a mutating way, it is free to do that, but then, someone needs to write that optimization - and the fact that this is not a priority for BEAM means that it is unlikely that someone will spend time to implement it if there are more pressing matters for them to do so.
I have no idea how they prioritize the new features to be built but IMHO safely optimizing code in mutating way is the reason why Haskell is so fast despite the heavy abstraction primitives it offers.
Also, what is the fun in writing perf code in another (native) language if host language is (or can be) capable by itself ?
What's the uptime, reliability and distribution story for Haskell? If I want to have three nodes sharing common state (either eventually consistent or using a consistency model like raft) how do I implement this with (X) Nines of uptime in Haskell? What's the process restart story for Haskell?
the point is that the data structures in erlang are chosen with abstractions necessary to support its features. "safe" for haskell is not the same as "safe" for erlang, and that why it's not easy to implement some of the things that are trivial in erlang, in haskell. Conversely, there are some things that are really safe in haskell (type safety and preventing process crashes) that you kind of have to add boilerplate to get confidence in erlang (or you just 'let it crash', which is a perfectly fine philosophy too). This is not saying that erlang is necessarily better, most of those problems have been "solved by kubernetes" anyways.
To put it simply, arrays are not an erlang data structure. Everything is a linked list, so you can have a high performance copy-on-write data structure that can be safely passed between erlang processes (I think)
We use the BEAM with native language NIFs to do numeric computation on Nvidia GPUs with CUDA/C++. Works great. JVM wouldn't help us much (and would be much more unstable due to JVMs inherent problems with memory leaks and performance.)
I wish BEAM could be as effective as or close to the performance of JVM for Numerical Computation.