ML in Go with a Python Sidecar

mountainriver · 2024-11-17T22:42:42 1731883362

I wrote https://github.com/aunum/gold as an attempt to try ML with Go.

I came to the opinion that it’s just not worth it. This isn’t what Go was designed for and it will likely never be a good language to interface with ML.

The Go to C FFI is too slow for native Cuda ops and Go doesn’t have a good bridge to Python.

Rust is a much much better option in these scenarios

coffee_am · 2024-11-18T12:43:36 1731933816

I think there is a misconception there.

Go is as good a language as any for an ML framework. Better if you buy into it being better for its simplicity. It's clearly a worse if you factor in the ecosystem that already exists in Python, and one is leveraging that.

The C FFI ("foreign function interface") doesn't play a role at the ML Framework level. One does not need to call a C function at a fine-grained level. One step of inference can be in the milliseconds time scale for LLMs, the C FFI costs are almost 5 orders of magnitude smaller (10s or 100s of nanoseconds?). Look, Python C bindings are also costly, and things just work.

Plus Go doesn't need a bridge to Python necessarily. It only needs a bridge to whatever Python is also using, if it wants to be performant.

Having said that, one cannot expect to implement the lower level operations in Go efficiently (well, competitively with what is best out there), or JIT-compile a computation graph to Go (doable but not as performant anyway). But that doesn't matter, fast execution of computation graphs can be implemented in C++/Rust/Cuda/etc. The ML Framework level of operations (building models, layers, etc.) can all be done in Go (or Rust, Elixir, Julia, Occam, Lisp, Basic, etc.)

But even that, like most things in life, folks think every problem needs the fastest of the fastest ... one can do matrix multiplication fast enough in Go for the majority of the small ML models out there -- for many tasks the time spent in ML inference is just a small fraction of a cost of the total system. It really doesn't matter if it is an order of magnitude slower.

mountainriver · 2024-11-21T23:01:13 1732230073

Sorry but this is just plain wrong. The real issue is Cuda not Python. Go's memory model makes it to where it has to trampoline to C, this is as much as 300ns in Go which is about 10x what Python is. These things add up significantly when doing ML training.

pjmlp · 2024-11-18T06:06:11 1731909971

Additionally Google seems to be fine with only providing a wrapper around Gemini endpoints and such.

Not like other ecosystems which are developing the infrastructure to develop ML stuff on the respective language, including targeting the GPU if one so wishes.

Go is the main language for the CNCF project landscape ecosystem and that is about it. And even then, newer projects are more likely to pick Rust.

RandomThoughts3 · 2024-11-18T15:44:21 1731944661

> Go is the main language for the CNCF project landscape ecosystem and that is about it. And even then, newer projects are more likely to pick Rust.

This part of your comment lost me. It seems to me that the overlap of projects where it would make sense to either use Rust or Go is pretty much empty.

Why would you use ever choose Rust and all the complexity it adds if Go is a sensible option?

pjmlp · 2024-11-19T14:39:54 1732027194

Go is only used in many CNCF projects, because of Docker and Kubernetes success.

Had Rust 1.0 been available when Docker decided to pivot from Python into Go, or Kubernetes pivoted from Java into Go, most likely those projects would have pivoted into Rust instead.

Newer CNCF candidates tend to be written in Rust nowadays.

neomantra · 2024-11-18T03:25:10 1731900310

Last week I started this OllamaTea [1] BubbleTea component library which tries to leverage this idea of an Ollama sidecar.

I've really appreciated Ollama as a simple local inferencing service. I'll now sometimes ask Ollama something using its CLI, rather than going to a URL bar. Once Llama 3.2 Vision became available on Ollama this month [2], I got excited to try them with visual UI elements. I've also wanted to explore some kid-friendly TUIs, which need to be cheap and private.

OllamaTea is not intended to be a full-feature Chat program, but rather as scaffolding for adding Ollama to custom TUI apps.

Once I had the base OllamaTea library, I was able write the POC that inspired the whole thing.; I made a TUI that plots a terminal chart of some market data, converts that to an image, then prompts Ollama about it. Easy to achieve once all the pieces were in place. [3]

The direct Ollama Golang API [4] is very easy to use. OllamaTea just makes it easier to make TUIs. While most of the research action is in Python, I think there's room for ML applications/services/orchestrators written in Golang and working with the Ollama API surface [5].

[1] https://github.com/nimblemarkets/ollamatea

[2] https://ollama.com/blog/llama3.2-vision

[3] https://github.com/NimbleMarkets/ollamatea/blob/main/cmd/ot-...

[4] https://pkg.go.dev/github.com/ollama/ollama/api

[5] https://github.com/ollama/ollama/blob/main/docs/api.md

daniel-thompson · 2024-11-17T20:10:30 1731874230

> Completely bespoke models are typically trained in Python using tools like TensorFlow, JAX or PyTorch that don't have real non-Python alternatives

The article outlines some interesting ways to evade this problem. What's the latest thinking on robustly addressing it, e.g. are there any approaches for executing inference on a tf or pytorch model from within a golang process, no sidecar required?

kevmo314 · 2024-11-17T20:22:19 1731874939

For Go specifically, there are some libraries like Gorgonia (https://github.com/gorgonia/gorgonia) that can do inference.

Practically speaking though, the rate at which models change is so fast that if you opt to go this route, you'll perpetually be lagging behind the state of the art by just a bit. Either you'll be the one implementing the latest improvements or be waiting for the framework to catch up. This is the real value of the sidecar approach: when a new technique comes out (like speculative decoding, for example) you don't need to reimplement it in Go but instead can use the implementation that most other python users will use.

neomantra · 2024-11-17T22:20:15 1731882015

Perhaps check out GoMLX ("an Accelerated ML and Math Framework", there's a lot of scaffolding and it JITs to various backends. Related to that project, I sometimes use GoNB in VSCode, which is Golang notebooks [2].

[1] https://github.com/gomlx/gomlx

[2] https://github.com/janpfeifer/gonb

eliben · 2024-11-17T23:34:44 1731886484

Indeed, I have a plan to publish a follow-up using GoMLX - I already have the code working and just need to clean it all up and write a post

richardjennings · 2024-11-17T20:45:35 1731876335

It is possible to include CPython in a CGO program - allowing Python to be executed from within the CGO process directly. This comes with some complexities - GIL and thread safety in Go routines, complexity of cross-compiling between architectures, overhead in copying values across the FFI, limitations of integrating as a Go module. I am hoping to see a CGO GIL'less Python integration show up here at some point that has all the answers.

uriah · 2024-11-17T21:14:54 1731878094

These frameworks are C++ under the hood. A far as I know (not too experienced with go) you can use cgo to call any C++ code. So you should be able to serialize the model (torchscript) then run it with libtorch. Tensorflow also similarly has a C++ api

pjmlp · 2024-11-18T06:07:14 1731910034

Pytorch and Tensorflow have first class support for C++ (naturally), and Java.

calderwoodra · 2024-11-17T19:30:02 1731871802

I was surprised you chose http for your IPC - I was expecting there to be a more handy tool that Python could expose and Go could leverage without needing to keep a second process constantly running.

herval · 2024-11-17T19:45:24 1731872724

Keeping models in memory is more efficient than constantly loading/unloading them, so a process that keeps running is the way to go