I came to the opinion that it’s just not worth it. This isn’t what Go was designed for and it will likely never be a good language to interface with ML.
The Go to C FFI is too slow for native Cuda ops and Go doesn’t have a good bridge to Python.
Rust is a much much better option in these scenarios
Go is as good a language as any for an ML framework. Better if you buy into it being better for its simplicity. It's clearly a worse if you factor in the ecosystem that already exists in Python, and one is leveraging that.
The C FFI ("foreign function interface") doesn't play a role at the ML Framework level. One does not need to call a C function at a fine-grained level. One step of inference can be in the milliseconds time scale for LLMs, the C FFI costs are almost 5 orders of magnitude smaller (10s or 100s of nanoseconds?). Look, Python C bindings are also costly, and things just work.
Plus Go doesn't need a bridge to Python necessarily. It only needs a bridge to whatever Python is also using, if it wants to be performant.
Having said that, one cannot expect to implement the lower level operations in Go efficiently (well, competitively with what is best out there), or JIT-compile a computation graph to Go (doable but not as performant anyway). But that doesn't matter, fast execution of computation graphs can be implemented in C++/Rust/Cuda/etc. The ML Framework level of operations (building models, layers, etc.) can all be done in Go (or Rust, Elixir, Julia, Occam, Lisp, Basic, etc.)
But even that, like most things in life, folks think every problem needs the fastest of the fastest ... one can do matrix multiplication fast enough in Go for the majority of the small ML models out there -- for many tasks the time spent in ML inference is just a small fraction of a cost of the total system. It really doesn't matter if it is an order of magnitude slower.
Sorry but this is just plain wrong. The real issue is Cuda not Python. Go's memory model makes it to where it has to trampoline to C, this is as much as 300ns in Go which is about 10x what Python is. These things add up significantly when doing ML training.
Additionally Google seems to be fine with only providing a wrapper around Gemini endpoints and such.
Not like other ecosystems which are developing the infrastructure to develop ML stuff on the respective language, including targeting the GPU if one so wishes.
Go is the main language for the CNCF project landscape ecosystem and that is about it. And even then, newer projects are more likely to pick Rust.
> Go is the main language for the CNCF project landscape ecosystem and that is about it. And even then, newer projects are more likely to pick Rust.
This part of your comment lost me. It seems to me that the overlap of projects where it would make sense to either use Rust or Go is pretty much empty.
Why would you use ever choose Rust and all the complexity it adds if Go is a sensible option?
Go is only used in many CNCF projects, because of Docker and Kubernetes success.
Had Rust 1.0 been available when Docker decided to pivot from Python into Go, or Kubernetes pivoted from Java into Go, most likely those projects would have pivoted into Rust instead.
Newer CNCF candidates tend to be written in Rust nowadays.
I came to the opinion that it’s just not worth it. This isn’t what Go was designed for and it will likely never be a good language to interface with ML.
The Go to C FFI is too slow for native Cuda ops and Go doesn’t have a good bridge to Python.
Rust is a much much better option in these scenarios