Something I've been thinking about is how the Minds -- the super-human AI hyper-...

int_19h · 2024-05-29T01:54:17 1716947657

The problem, if you embed an ALU like that, is how to train it to use them properly. And then it's not clear if they actually need to be able to do that in the middle of a pass that, at the end, is going to produce a single token anyway.

Controlling that stuff via output tokens actually kinda makes sense by analogy, since that is how we use calculators etc. But I do agree that specialized tokens that are used specifically to activate tools like that might be a better idea than just using plain text to signal in-band. And production of such specialized tokens can be easily trained.

vessenes · 2024-05-28T11:10:13 1716894613

Fellow huge Banks fan here.

I like this idea a lot. Right now we are going the long/hard way round, and post training asking an LLM to know it needs compute, then write a compute request, then feed back the compute answer into a tokenization loop.

It probably does make sense to add a mini CPU as a layer / tool / math primitive. I wonder how you'd train it to use such a thing? In my mind it's not really a layer per-se, but it's a set of function calls a layer could route to when it wants, and weight the response appropriately.