The chain-of-thought prompting demonstrates that the output tape can act as proc...

whimsicalism · on April 5, 2022

Well it is unfortunately an output that scales at O(n^2) its length, which is not super great once you get to sequences of 1000s of words

FeepingCreature · on April 5, 2022

Sparse transformers are a thing. Dictionary lookup is a thing. The transformer can probably be trained to store long chains of information in a dedicated memory system, retrieving it using keywords.

These are the things that came to mind in ten seconds. This is not going to be the problem that meaningfully delays AGI.

whimsicalism · on April 5, 2022

As I said in another comment, I think memory will be the easiest of these challenges to solve. That said, it hasn't really been solved yet.

If we can scale up S4-style architectures to this size, maybe it will be solved.