Wouldn't attention networks use less computation for the same number of weights?... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

BobbyJo on Jan 18, 2023 | parent | context | favorite | on: A stack of feed-forward layers does surprisingly w...

Wouldn't attention networks use less computation for the same number of weights? Feed forward networks have higher connectivity no?

If a network is the same size, less computationally demanding, and gives you 3% improvement, it seems extremely worthwhile, especially given the (mostly) diminishing returns of just adding more weights/layers.

jeffreyrogers on Jan 18, 2023 [–]

It is less general though, so this is interesting because it shows you can learn a lot without adding very much structure.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact