Taking into account the limits of how much can be stored in a hidden state.
My theory is that the model will start generalising the information stored once it starts going past its limits (like real life humans)
So if we take books as an example, we probably do not remember every single word. But we might vaguely remember which section or book events occur
Combine this with the agent model, and we may have an alternative for embeddings. Where we can ask which pages the model recall is relevant to the question. Bring those pages up again. And get the answers
(Once again like how a human might answer in real life)
My theory is that the model will start generalising the information stored once it starts going past its limits (like real life humans)
So if we take books as an example, we probably do not remember every single word. But we might vaguely remember which section or book events occur
Combine this with the agent model, and we may have an alternative for embeddings. Where we can ask which pages the model recall is relevant to the question. Bring those pages up again. And get the answers
(Once again like how a human might answer in real life)