I’m unfortunately the sickest I’ve been in years, so this will have to wait. May...

joxel · on Nov 19, 2022

You mention this technique has been shown to outperform GPT3, do you have a citation for that? Would love to read more details about this interesting concept.

fl0id · on Nov 19, 2022

they didn't say it was shown, he stated that those to be developed models would do that.

joxel · on Nov 19, 2022

"Adding weights non uniformly during training (not after) is the key to smaller models that outperform present day GPT3." seems to greatly imply a certainty of result that has already been discovered. Though this commenter is familiar to me, and I know he has made silly claims in other threads throughout the years here.

sillysaurusx · on Nov 19, 2022

I would hope so. It’s in my name!

I ended up calling an ambulance so I’ll postpone this until later. Feeling a little better but a full explanation will have to wait.

The answer is that of course it’s not proved yet, since no one has implemented it (or at least efficiently). It’s fine to be skeptical.

Current techniques are blocked by the technical challenge of getting 10GB+ to fit on a pod. Very few people have those skills. If there’s even a chance that this will work, it’s worth exploring, so I will be.

NavinF · on Nov 19, 2022

Sounds kinda like progressive growing except you're not doubling the resolution uniformly. See ProGAN and its successors. You'd still need to add a large block of weights at a time for performance reasons.

Edit: Ah I checked your profile and you already know all this. You probably should have mentioned that lol