Compiling ML models to C for fun

okaleniuk · 2023-09-22T06:31:23.000000Z

I think this is brilliant! And, what's more important, practical.

I did a similar thing a few years back for small linear systems: https://wordsandbuttons.online/outperforming_everything_with... The idea is essentially the same - we both turn the computation process itself into code. I choose LLVM intermediate language instead of C because the whole point of my exercise was to show that you don't need C (or any other compiler) to generate efficient code. In fact, if you already have your computation figured out, C only gets in the way. For this very reason, I avoided calling the code generation process a "compiler".

My thing, however, doesn't have any practical usage. This one does and I think it's beautiful!

tekknolagi · 2023-09-22T06:35:22.000000Z

This is super cool! Thank you for sharing. I should probably try LLVM or some other IR like MLIR. I wanted to do MLIR originally but the Python bindings are a little unstable right now. Any other IR (not C) might allow me to express the intended semantics better than C does.

mathisfun123 · 2023-09-22T07:38:50.000000Z

> I wanted to do MLIR originally but the Python bindings are a little unstable right now.

Hmmm why do you think so? There's been one (small) breaking change in the last 6 months (and it came with a simple upgrade recommendation).

(If it's not obvious, I contributedl to the python bindings but I would prefer to remain anon).

tekknolagi · 2023-09-22T12:32:11.000000Z

The docs say not to use them :) I didn't try it out

mathisfun123 · 2023-09-22T15:26:49.000000Z

really? where does it say that?

https://mlir.llvm.org/docs/Bindings/Python/

is it the "not enabled by default" that scares you off? i honestly don't even know what that's supposed to mean since there's no default anything since there's no official binary build of mlir currently being released.

anyway i'm not trying to be confrontational or adversarial - i was more asking in a "how can i help you accomplish what you want" kind of tone.

tekknolagi · 2023-09-22T15:44:22.000000Z

I am not reading it as confrontational, no worries. I just wanted to use something that was either already available or easily installable. If it says "off by default" I just wonder what that means or what hoops I will have to jump through. But if you say it is as simple as installing a PyPI package or something, great! I will give it a go.

boywitharupee · 2023-09-22T13:31:16.000000Z

If you could translate your computations to MLIR then you have the ability to lift your instructions to various dialects.

For ex, you can lift to the 'metal' dialect or 'nvgpu' dialect. There's also an 'affine' dialect, which implements a small set of polyhedral transformations and works on memrefs like most compilers do. However, it seems that the 'affine' dialect is not as actively used these days. Most optimizations are now performed on tensors, but depending on your use case, it may still make sense.

tekknolagi · 2023-09-22T13:58:36.000000Z

I will probably look into it again in the future. It seems like a very useful IR

tekknolagi · 2023-09-21T05:11:43.000000Z

Thanks for posting! Happy to answer questions and curious what you all think

loxias · 2023-09-22T06:24:48.000000Z

I think this is no short of brilliant, and I immediately bookmarked it for further followup, because it's late. Missing piece.

I don't do deep learning, but have been recently thinking about paths to optimize DAG things, user specified functions in a completely different context, and have been wondering if there's enough off the shelf code for me to glue together...

I look forward to possibly standing on your shoulders to build something cool, and giving appropriate credit, should I post a "show HN" in 3mo-years. :)

tekknolagi · 2023-09-22T06:30:47.000000Z

Oh this is far too kind, but I hope it does inspire you to build little compilers everywhere! I am curious about your idea(s); please feel free to reach out via email.

EDIT: Also see the top-level comment by okaleniuk: https://news.ycombinator.com/item?id=37608605

loxias · 2023-09-22T06:41:35.000000Z

I'm a systems programmer, with experience writing high performance code for mathy stuff (signal processing), network stuff, filesystem stuff, and most recently, I'm learning-by-doing "database stuff". I was joking at work the other day "if I can get compilers under my belt I'll have caught them all".

I see this applied to work (database) in a way I'll not elaborate until I can do it. :)

But in other areas, I've done a lot of signal processing, numerical algorithm development and optimization, lots of MATLAB (for improving the underlying math, but also using approximation theory for performance. Think: "replace exp(x) with a taylor series" except .... more. :)) and C/C++ (for turning those graphs of lists of equations) into machine code.

I've written dataflow style code starting from "paper of equations", to "runs on N nodes" many times.

Perhaps then you see how this looks (to me) like a "missing piece". :) I can now mix between mathematical optimizations (like replacing an expensive function with a pade approximant, using a lattice of symbolic identities, etc) and _compiler_ optimizations. \o/

tekknolagi · 2023-09-22T06:43:35.000000Z

Looking forward to reading your query compiler :)

regularfry · 2023-09-22T09:12:05.000000Z

I have a fairly strong suspicion you could generate ggml C code from python models directly. Not even for execution from python, just to give you a performant CPU model for any architecture rather than waiting for them to be manually ported.

Not sure that's a job for me, though - maybe someone else has already done it and I just haven't seen it yet?

tekknolagi · 2023-09-22T12:33:30.000000Z

Is ggml an easy to use library? That would make generating tensor-based networks easier for sure

JonChesterfield · 2023-09-22T12:34:39.000000Z

Sounds like triton

baq · 2023-09-22T06:58:53.000000Z

Maybe you could generate Python and pass it to numba? In my very limited experiments writing numba Python is like writing C, this also includes segfaults etc.

TCC compile speed to speedup ratio is bonkers BTW.

tekknolagi · 2023-09-22T07:00:24.000000Z

Yeah, exactly! I mention this in the context of PyPy at the bottom of the post. Numba is another great suggestion here.

Re: TCC: I know, right? I think it would be good to see how much we get from a) just not allocating the whole graph every time and b) doing it linearly/bytecode-style, both while still in Python-land.

montebicyclelo · 2023-09-22T10:08:15.000000Z

Thanks for this. My approach to speeding up an autodiff system like this was to write it in terms of nd-arrays rather than scalars, using numpy/cupy [1]. But it's still slower than deep learning frameworks that compile / fuse operations. Wondering how it compares to the approach in this post. (I might try to benchmark.)

[1] https://github.com/sradc/SmallPebble

tekknolagi · 2023-09-22T12:32:38.000000Z

Please let us know the results for this network!

tekknolagi · 2023-09-22T12:59:18.000000Z

Re: Small Pebble: what is this Lazy thing? I've seen it crop up a couple places.

montebicyclelo · 2023-09-22T13:26:43.000000Z

The underlying autodiff engine in smallpebble runs computations immediately, as they are defined, for the given inputs. But when modelling, you want to define a model, and then use it again and again, on different inputs. The laziness is a way to do that; you can define a graph of autodiff operations (using "sp.Lazy", and "sp.Placeholder"), without running it, and then pass in values and run the graph on those operations. We end up with something similar to how models are defined in tensorflow [1].

[1] https://www.tensorflow.org/js/guide/models_and_layers#the_fu...

tekknolagi · 2023-09-22T13:28:17.000000Z

Aha! Okay, thank you. And how do you pass in the inputs differently?

montebicyclelo · 2023-09-22T13:34:37.000000Z

In smallpebble it's currently done by setting the value of the Placeholders, e.g.

    # define some graph
    x = sp.Placeholder()
    y = sp.Lazy(sp.square)(x)

    # and run the graph later on
    x.assign_value(sp.Variable(np.array([1, 2, 3])))
    y_val = y.run()  # the result, y_val is an sp.Variable
    gradients = sp.get_gradients(y_val)

    # run same graph with different values
    x.assign_value(sp.Variable(np.array([5, 6, 7])))
    y_val = y.run()
    gradients = sp.get_gradients(y_val)

(There's some more examples of this being used in training loops in the readme.)

tekknolagi · 2023-09-22T13:59:11.000000Z

Oh, oh, okay, these placeholders are like my `input` Values. Neat! Thank you.

vasilipupkin · 2023-09-22T15:05:50.000000Z

why not just use numpy which is already in C?

tekknolagi · 2023-09-22T15:45:21.000000Z

That's covered at the bottom of the post (tensor-valued vs scalar-valued). It would make some of the math kernels faster, since numpy has a bunch of specializations built-in. But it would not speed up any of the connective "glue" bits that are still going to be slow. And it would still probably involve a bunch of allocations.

choeger · 2023-09-22T05:13:02.000000Z

Now even reading ML and compiling in the title is not a guarantee for ML the language. I hate overloading.

tekknolagi · 2023-09-22T05:14:26.000000Z

One day I will write an SML compiler but today is not that day

nerdponx · 2023-09-22T16:21:15.000000Z

Doing ML in ML seems like an under-explored area.