Fellow huge Banks fan here. I like this idea a lot. Right now we are going the l...

Fellow huge Banks fan here.

I like this idea a lot. Right now we are going the long/hard way round, and post training asking an LLM to know it needs compute, then write a compute request, then feed back the compute answer into a tokenization loop.

It probably does make sense to add a mini CPU as a layer / tool / math primitive. I wonder how you'd train it to use such a thing? In my mind it's not really a layer per-se, but it's a set of function calls a layer could route to when it wants, and weight the response appropriately.