Hacker News new | past | comments | ask | show | jobs | submit login

"Adding weights non uniformly during training (not after) is the key to smaller models that outperform present day GPT3." seems to greatly imply a certainty of result that has already been discovered. Though this commenter is familiar to me, and I know he has made silly claims in other threads throughout the years here.



I would hope so. It’s in my name!

I ended up calling an ambulance so I’ll postpone this until later. Feeling a little better but a full explanation will have to wait.

The answer is that of course it’s not proved yet, since no one has implemented it (or at least efficiently). It’s fine to be skeptical.

Current techniques are blocked by the technical challenge of getting 10GB+ to fit on a pod. Very few people have those skills. If there’s even a chance that this will work, it’s worth exploring, so I will be.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: