Hacker News new | past | comments | ask | show | jobs | submit login

That’s not an accurate description. Attention / multi-head attention mechanisms allow the model to understand relationships between words far apart and their context.

They still lack, as far as we know, a world model, but the results are already eerily similar to how most humans seem to think - a lot of our own behaviour can be described as “predict how another human would reply”.




When trained on simple logs of Othello's moves, the model learns an internal representation of the board and its pieces. It also models the strength of its opponent.

https://arxiv.org/abs/2210.13382

I'd be more surprised if LLMs trained on human conversations don't create any world models. Having a world model simply allows the LLM to become better at sequence prediction. No magic needed.

There was another recent paper that shows that a language model is modelling things like age, gender, etc., of their conversation partner without having been explicitly trained for it


Do we know for a fact that the mechanisms are actually used that way inside the model?

My understand was that they know how the model was designed to be able to work, but that there's been very little (no?) progress in the black box problem so we really don't know much at all about what actually happens internally.

Without better understanding of what actually happens when an LLM generates an answer I stick with the most basic answer that its simply predicting what a human would say. I could be wildly misinformed there, I don't work directly in the space and its been moving faster than I'm interested in keeping up with.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: