|-----long timing loop / top of parse tree-----| | | |-shorter / child node -| |...

politician · on March 11, 2023

Similarly, ChatGPT (gpt2-medium) has 12 heads of attention. GPT-3 has 96.

IIAOPSW · on March 11, 2023

I don't know how superimposed waves in finely tuned timing loops with non-linear interference translates into heads of attention, and honestly I suspect a lot of the things difficult to do with heads of attention (and other approaches of the past) come for free in a resonance based system.