Hacker News new | past | comments | ask | show | jobs | submit login

The chain-of-thought prompting demonstrates that the output tape can act as procedural memory.



Well it is unfortunately an output that scales at O(n^2) its length, which is not super great once you get to sequences of 1000s of words


Sparse transformers are a thing. Dictionary lookup is a thing. The transformer can probably be trained to store long chains of information in a dedicated memory system, retrieving it using keywords.

These are the things that came to mind in ten seconds. This is not going to be the problem that meaningfully delays AGI.


As I said in another comment, I think memory will be the easiest of these challenges to solve. That said, it hasn't really been solved yet.

If we can scale up S4-style architectures to this size, maybe it will be solved.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: