Hacker News new | past | comments | ask | show | jobs | submit login

    |-----long timing loop / top of parse tree-----|
    |                                              |
    |-shorter / child node -|                      |-shorter / child node-|
    |                       |                      |                      |
    |highest freq|          |highest freq|         |highest freq|         |highest freq|



Similarly, ChatGPT (gpt2-medium) has 12 heads of attention. GPT-3 has 96.


I don't know how superimposed waves in finely tuned timing loops with non-linear interference translates into heads of attention, and honestly I suspect a lot of the things difficult to do with heads of attention (and other approaches of the past) come for free in a resonance based system.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: