Distributed Machine Learning Notebooks with Elixir and Livebook

josevalim · on April 11, 2023

Hi everyone, contributor to Livebook and Nx here.

When we started the Numerical Elixir effort, we were excited about the possibilities of mixing projects like Google XLA's (from Tensorflow) and LibTorch (from PyTorch) with the Erlang VM abilities to run concurrent, distributed, and fault-tolerant software.

I am very glad we are at a point where those ideas are coming to life and I explore part of it in the video. My favorite bit: making the tensor serving implementation cluster distributed took only 400 LOC (including docs and tests!): https://github.com/elixir-nx/nx/pull/1090

I'll be glad to answer questions about Nx or anything from Livebook's launch week!

DrewADesign · on April 11, 2023

Nothing specific about this project, and don't feel obligated to respond, but I just wanted to thank you for all the work you've done with Elixir and the related ecosystem. Great language, great tools, and a helpful, welcoming community. It was a perfect introduction to practical functional programming.

Haven't found a big project for it yet, but I've done a bunch of little side projects since a friend who worked at Appcues gave me the hard sell on it around 2018.

pejrich · on April 11, 2023

> contributor to Livebook and Nx here

While accurate, it's a bit of an understatement :) Thanks for all your work, Jose.

dpflan · on April 11, 2023

Thanks for sharing!

The distributed ML currently seems focused on model execution. I see another commenter's excitement about "Looking forward to NX transformations that take distributed training next level." -- which I agree, will be quite interesting.

Where / how do you see Nx being used effectively in distributed training? Is distributed training a reality for open-sourced models to compete against big tech models?

josevalim · on April 11, 2023

For distributed training, one important feature is to be able to do GPU-to-GPU communication, such as allreduce, allgather, and all2all. Those are not supported at the moment but they are in our roadmap. At this level, however, it seems the language runtime itself plays a reduced role, so I don't expect the experience to be much different to, say, Python/JAX.

For the second question, my understanding is that all big tech models rely on distributed training, so distributed training is a requisite for competing really.

andy_ppp · on April 11, 2023

Do you ever think about why you’re probably a 100x programmer, is it just working memory and pure intelligence or some strategy or tactics that make you so good at this. Asking for a friend :-)

Dowwie · on April 11, 2023

Is anyone working on audio libraries that will enable streaming audio chunks for Whisper processing? Saving audio files into a local file system, running ffmpeg to chunk, and then sending them off to Whisper is very tactical..

josevalim · on April 11, 2023

The current pipeline expects PCM audio blobs and, if data is coming from a microphone in the browser, you can do the initial processing and conversion in the browser (see the JS in this single file Phoenix app speech to text example [0]).

On the other hand, if you expect a variety of formats (mp3, wav, etc), then shelling out or embedding ffmpeg is probably the quickest path to achieve something. The Membrane Framework[1] is an option here too which includes streaming. I believe Lars is going to do a cool demo with Membrane and ML at ElixirConf EU next week.

[0]: https://github.com/elixir-nx/bumblebee/blob/main/examples/ph...

[1]: https://membrane.stream/

ricketycricket · on April 11, 2023

> I believe Lars is going to do a cool demo with Membrane and ML at ElixirConf EU next week.

Yes, the relevant part of his demo with the membrane pipeline appears to be here: https://github.com/lawik/lively/blob/master/lib/lively/media...

Dowwie · on April 12, 2023

limited in usefulness.. seems that Lars kept a MembraneTranscript library dependency private

ricketycricket · on April 13, 2023

This one? https://github.com/lawik/membrane_transcription

mark_h · on April 12, 2023

Quick example video from Chris McCord using ffmpeg and whisper in Phoenix: https://www.phoenixframework.org/blog/whisper-speech-to-text...

lawik · on April 12, 2023

Sure.

I have a rough one using Membrane (media framework) that you can find here: https://github.com/lawik/membrane_transcription

I am using it for this talk I am putting together for ElixirConf EU so if you want it used in context that might be helpful: https://github.com/lawik/lively

Neither is release-worthy levels of polish but if interest is there I should make a proper library out of it.

That is to say streaming chunks works great already. I would love two things. Stitching the edges of the chunks, would probably need to do overlapping for that. And building chunks based on silence. That's more DSP than I know though.

Dowwie · on April 12, 2023

Hey Lars! Building chunks on silence is a worthy cause! Why stitch the edges of the chunks? Is that because there isn't a clean chunk on silence?

I think this work is very important. I don't understand whether I actually needed to install the library dependencies for Membrane's sake or specifically for this use case (mad, ffmpeg, portaudio). Doesn't feel right..

brentjanderson · on April 11, 2023

You may be able to incorporate the [Membrane Framework](https://membrane.stream/) to do that. Built in Elixir, deals in those types of multimedia problems.

I'm not an expert here, but I'd expect that capturing a sample using Membrane and piping it into Whisper should be doable.

dangerboysteve · on April 11, 2023

As everyone is chiming in on, fantastic work by you and your team.

smw · on April 13, 2023

Looks amazing! Small user feedback:

Even reading the blog, after installing the windows app it's not obvious how to get to the machine learning demos.

Also, after I found the +smart button from another page, on windows it fails due to lack of make (and presumably a set of compiler tools). This was frustrating trying to demo for someone on their computer.

mrdoops · on April 11, 2023

It is bonkers how little code and need-to-now is necessary to deploy cutting edge models in an Elixir app these days.

I didn't realize until a recent side project just how much progress had been made in Nx until I started implementing parts of Nx Serving myself only to find the Nx libraries already have distributed batched serving, faiss, pg_vector support and more.

Makes me want to quit all work obligations to hit the books and build product with Nx.

throwawaymaths · on April 11, 2023

It's pretty clear that Joe Armstrong respecting that the speed of light is a thing and that data locality/data gravity are real is starting to pay off in big ways.

I do wonder if maybe streaming large data chunks over Erlang distribution might be a problem and a secondary data channel (e.g. over udp or sctp) might be worth playing with.

Looking forward to NX transformations that take distributed training next level.

Ankhers · on April 11, 2023

> I do wonder if maybe streaming large data chunks over Erlang distribution might be a problem and a secondary data channel (e.g. over udp or sctp) might be worth playing with.

You may want to take a look at the partisan[0] library written in Erlang. It is basically that, a reimagination of distributed Erlang, except that it can be multiplexed over multiple connections.

[0] - https://github.com/lasp-lang/partisan/

throwawaymaths · on April 11, 2023

Yeah but partisan gives you a "ton of stuff you might not need" plus the point is to treat distribution as a control plane and separate concerns from the data plane. There used to be things to worry about using Erlang distribution in general -- irrespective of backend, iirc, like HOL blocking (I think those are resolved now).

bmitc · on April 11, 2023

> It's pretty clear that Joe Armstrong respecting that the speed of light is a thing and that data locality/data gravity are real is starting to pay off in big ways.

I'm familiar with Joe Armstrong and Erlang/Elixir, but do you have a particular reference in mind where he was specifically discussing this? Is it one of his papers or talks? Just looking for another interesting thing Joe Armstrong said or thought. :)

throwawaymaths · on April 11, 2023

I don't have a reference offhand, but I have seen it. It's mostly a vibe. Remember that Joe was a physicist before a programmer: the synchronicity problem is pervasive in the design of the platform. Local immediate access to data is generally a special cased situation via an escape hatch with tons of big red warning signs.

POiNTx · on April 11, 2023

Right here, but the entire talk is great: https://youtu.be/lKXe3HUG2l4?t=1431

OkayPhysicist · on April 11, 2023

I think he said it a few times, but "The Mess We're In" features it prominently.

skybrian · on April 12, 2023

The documentation is a bit hard to find. It seems to be here?

https://github.com/livebook-dev/livebook/blob/main/lib/liveb...

josevalim · on April 12, 2023

The documentation for distributed servings can be found here: https://hexdocs.pm/nx/Nx.Serving.html

Or are you looking for something else?

skybrian · on April 12, 2023

I’m looking for an overview of how Livebook works. What’s the architecture and what are its advantages? How do its notebooks do calculations?

Context: I’ve used Observable notebooks a fair bit, but not other kinds of notebooks.

nestorD · on April 12, 2023

Is there work to use a Slurm (MPI based) cluster transparently in Elixir?

Having both great native support for distribution and a performance, GPU able, numerical library could make Elixir a great fit for that platform.

nethdeco · on April 12, 2023

I have used Elixir for few personal projects. I would like to know if there is a limit of where network latency becomes too much compared to the overall ML execution?

Jackie4Chan · on April 12, 2023

Can anyone recommend a dev board for this use case?