if you think about it, next-token prediction is just stupid. it's so resource intensive
yet, it's mimicking emergent thought quite beautifully. it's shockingly unintuitive how a simple process scaled enormously can lead to this much practical intelligence (practical in the sense that's useful, but it's not the way we think). I'm aware there are multiple layers, filters, processes etc., I'm just talking about the foundation, which is next-token prediction.
when I first heard that it's not predicting words, but parts of words, I immediately saw a red flag. yes, there are compounded words like strawberry (straw + berry) and you can capture meaning at a higher-resolution, but most words are not compounded, and just in general we're trying to simulate meaning instead of 'understanding' it. 'understanding' simply means knowing a man is to a woman what a king is to a queen, but without the need to learn about words and letters (that should be just an interface).
I feel we're yet to discover the "machine code" for ASI. it's like we have no compiler, but we directly interpret code. imagine the speed-ups if we could just spare the processor from understanding our stupid, inefficient language.
I'd really like to see a completely new approach working in the Meaning Space, which transcends the imperfect Language Space. This will require lots of data pre-processing, but it's a fun journey -- basically a parser human-machine and machine-human. I'm sure I'm not the first one thinking about it
so what we've got so far?
https://drive.google.com/file/d/1BU5bV3X5w65DwSMapKcsr0ZvrMR...