I may remember more than 4096 tokens, but I probably only pay attention to 7 of ...

Symmetry · 2024-05-04T13:17:18

The context heads of a LLM are more analogous to the sort of processing that goes on in, e.g., Brocca's Area of your brain as opposed to working memory. You can't have anything analogous to working memory as long as LLMs are operating on a strict feed forward basis[1]. And the fact that LLMs can talk so fluently without anything like a human working memory (yet) is a bit terrifying.

[1] Technically LLMs do have a forget that last toke and go back so I can try again operation so this is only 99% true.