Hacker News new | past | comments | ask | show | jobs | submit login

Because until he tries it, who knows if it works?

There are a thousand papers out there making minor tweaks to the transformer architecture. 99% of them are also worthless and forgotten.

> Because until he tries it, who knows if it works?

That's precisely what he shared this for, though. So someone willing to train a model with this tweak tries it.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
