Tinygrad will be the next Linux and LLVM

saagarjha · 2024-09-23T07:50:59 1727077859

Well, neither Linux nor LLVM loudly proclaimed that they would be the next Internet or GUI. So I am inclined to believe that this will not be the case and the person doing the proclamation might be a little full of himself.

bryanlarsen · 2024-09-23T12:40:52 1727095252

Interesting contrast to how Linux itself was first introduced:

"just a hobby, won't be big and professional like gnu"

mikewarot · 2024-09-23T07:57:53 1727078273

TinyGrad is GeoHot's system/compiler to map neural networks onto hardware. He consistently points out this one point: Because the exact number of cycles is know in advance, it can be scheduled, there's no need for branch prediction, or that type of thing in a CPU.

Essentially, he wants to be able to express programs, and even an operating system, as a directed acyclic graph of logical binary operations, so that you can have consistent and deterministic runtime behavior.

The bit about LLMs is a distraction, in my opinion.

zevv · 2024-09-23T08:14:21 1727079261

> he wants to be able to express programs, and even an operating system, as a directed acyclic graph of logical binary operations, so that you can have consistent and deterministic runtime behavior.

So how is this different from digital logic synthesis for CPLD/FPGA or chip design we have been doing over the last decades?

mikewarot · 2024-09-23T08:18:57 1727079537

FPGAs are (prematurely) optimized for the wrong things, latency and utilization. The hardware is heterogeneous, and there isn't one standard chip. Plus they tend to be expensive.

The idea is to be able to compile/run like you can now with your Von Neuman machine.

FPGA compile runs can sometimes take days! And of course, chips take months and quite a bit of money for each try through the loop.

woopsn · 2024-09-23T16:11:46 1727107906

With FPGAs I can sample a hundred high precision ADCs in parallel and feed them through DSP, process 10Gb ethernet at line rate, etc with deterministic outcomes (necessary given safety and regulatory considerations). They integrate well with CPUs and other coprocessors - heterogeny isn't wrong. Plus training a NN model also takes days! To be fair not always, but for the above applications my build time was hours to many-hours anyway.

I grant the hardware is absurdly expensive at the high end, but I really don't think application wise the comparison is apples to apples.

Hotzs saying literally everything with an io pin or actuator will be driven solely by NN (driven by tinygrad) seems to me maybe 1/3 self promotion, 1/3 mania, some much smaller amount incisive at best.

WoodenChair · 2024-09-23T07:48:43 1727077723

> While there may be a legacy Linux running in a VM to manage all your cloud phoning spyware, the core functionality of the lifelike device is boot to neural network.

No, I do not think future devices will be "boot to neural network." Traditional algorithms still have a place. Your robot vacuum cleaner (his example) may still use A* to route plan, and Quicksort to display your cleanings in terms of most energy usage.

> Without CPUs, we can be freed from the tyranny of the halting problem.

Not sure what this means but I think it still makes sense to have a CPU directing things as in current architectures. You don't just have your neural engine, you also have your GPU, Audio system, input devices, etc. and those need a controller. Something needs to coordinate.

TimSchumann · 2024-09-23T07:02:08 1727074928

> Without CPUs, we can be freed from the tyranny of the halting problem.

Can someone please explain to me what this even means in this context?

Serious question.

orra · 2024-09-23T07:40:20 1727077220

He also claims that the cardinality of the reals is the same as the integers.

https://news.ycombinator.com/item?id=36074287

You could say he had a history of using big words to talk shit.

WithinReason · 2024-09-23T07:54:42 1727078082

A neural network is perfectly deterministic, the runtime is predictable before you run it. Which I don't think is going to be true much longer.

https://news.ycombinator.com/item?id=41623474

benzible · 2024-09-23T07:05:36 1727075136

It's gibberish. For one thing... https://arxiv.org/abs/1901.03429

mikewarot · 2024-09-23T08:10:07 1727079007

Think of it as unwinding a program all the way until it's just a list of instructions. You can know exactly how long that program will take, and it will always take that same time.

krisoft · 2024-09-23T08:25:30 1727079930

But will it always solve the task? Because without that it it is trivially easy to “solve” the halting problem by just declaring that the turing machine halts after X steps.

TimSchumann · 2024-09-23T14:57:21 1727103441

Wouldn’t this also imply a lack of Turing completeness, and thus not be good for general purpose computing?

mikewarot · 2024-09-23T07:26:59 1727076419

He's got the kernel of a good idea. Deterministic data flows are a good thing. We keep almost getting there, with things like data flow architectures, FPGAs, etc. But there's always a premature optimization for the silicon, instead of the whole system. This leads to failure, over, and over.

He's wrong in the idea of using an LLM for general purpose compute. Using math instead of logic isn't a good thing for many use cases. You don't want a database, or an FFT in a Radar System to hallucinate, for example.

My personal focus is on homogeneous, clocked, bit level systolic arrays.[2] I'm starting to get the feeling the idea is really close to being a born secret[1] though, as it might enable anyone to really make high performance chips on any fab node.

[1] https://en.wikipedia.org/wiki/Born_secret

[2] https://github.com/mikewarot/Bitgrid

KeplerBoy · 2024-09-23T07:36:46 1727077006

You could still build a FFT in tinygrad and it would be as deterministic as it's matmuls (so not bitwise deterministic, due to the non-associativity of floating point math and the way GPUs don't guarantee execution order, but we are okay with that). The matmuls in the NNs don't hallucinate.

chenzhekl · 2024-09-23T07:05:47 1727075147

I don't know why I should switch from PyTorch to Tinygrad as a researcher and practitioner. In terms of kernel fusion, there is torch.compile. Not to say there is a large ecosystem behind PyTorch and almost every paper today is published with a PyTorch implementation. Probably what Tinygrad shines is bare-metal platforms?

skybrian · 2024-09-23T07:31:58 1727076718

I don’t understand the LLVM comparison. Is it somehow a compiler backend for conventional programming languages? Can you run C or Rust code?

fuhsnn · 2024-09-23T07:41:01 1727077261

Me either, it's like saying ai-dependency is the next freedom.

melodyogonna · 2024-09-23T07:41:41 1727077301

Makes me wonder if he knows what LLVM does.

If I understand him correctly, if everything becomes a neural network then he expects most neural networks to use Tinygrad

winocm · 2024-09-23T08:40:31 1727080831

Same.

krackers · 2024-09-23T07:17:04 1727075824

> tinygrad has a hardware abstraction layer, a scheduler, and memory management. It's an operating system

Doesn't every ML framework have that?

almostgotcaught · 2024-09-23T07:28:46 1727076526

nah not like he's talking about - TF and PT definitely punt all that down to tensorrt or hip or whatever. doesn't mean there's anything novel here - just that TF and PyTorch don't do it.

tmitchel2 · 2024-09-23T11:54:49 1727092489

I generally don't read anything by gh but I think he is cryptically just referring to something like XLA, whereby your NN architecture gets compiled straight to hardware, say to a custom asic, or to an FPGA bit stream, etc.

It's definitely going to happen but I don't think it will replace CPU's much like human brains can't quite replace CPU's and what they are optimised for.

Trying to make out that TinyGrad is leading the charge in this is quite self indulgent.

akoboldfrying · 2024-09-23T08:24:38 1727079878

>Without CPUs, we can be freed from the tyranny of the halting problem.

In the same way that we can be freed of the tyranny of being able to write a for loop.

WithinReason · 2024-09-23T07:52:42 1727077962

The only reason neural networks don't have control flow is because they are not very good. They are incredibly inefficient and the only way to properly solve that is to introduce control flow, for example: https://arxiv.org/abs/2311.10770

FloatArtifact · 2024-09-23T09:42:35 1727084555

Great... Does this mean my pc will hallucinate kernel panics when it doesn't even have a kernel?

almostgotcaught · 2024-09-23T07:25:57 1727076357

no it won't, because while hitting ioctls in python is cute

https://github.com/tinygrad/tinygrad/blob/master/extra/hip_g...

it is definitely not shippable

georgehotz · 2024-09-23T09:55:58 1727085358

Umm, why not?

We wrote entire NVIDIA, AMD, and QCOM drivers in that style.

https://github.com/tinygrad/tinygrad/blob/master/tinygrad/ru...

almostgotcaught · 2024-09-23T13:41:07 1727098867

Because it's slow duh

georgehotz · 2024-09-23T13:50:02 1727099402

This sounds like prejudice. Have you benchmarked it?

almostgotcaught · 2024-09-23T14:06:00 1727100360

Yes I literally duplicated your approach for my driver stack last week and surprise surprise the FFI overhead into libc is too high.

georgehotz · 2024-09-23T15:30:36 1727105436

FFI? This isn't how GPUs work...they are MMIO (mostly)

Those drivers are faster than anything else when used to run fixed command queues (what neural network runs are)

timkq · 2024-09-24T08:10:19 1727165419

I can't say anything on the performance, but inline assembly in Python is crazy

almostgotcaught · 2024-09-27T03:55:29 1727409329

It's not inline assembly it's just ioctl through ctypes via libc.

carrja99 · 2024-09-23T08:28:39 1727080119

Isn’t this the guy who joined Twitter as an intern to “fix” search?

djaouen · 2024-09-23T15:24:37 1727105077

Yeah, good luck with that, lol