I wasn't sure if this paper was parody on reading the abstract. It's not parody....

daxfohl · 2024-02-22T03:29:29 1708572569

Seems like it could be useful for resizing the networks, no? Start with ChatGPT 4 then release an open version of it with much fewer parameters.

Or maybe some metaparameter that mucks with the sizes during training produces better results. Start large to get a baseline, then reduce size to increase coherence and learning speed, then scale up again once that is maxed out.

SubiculumCode · 2024-02-22T03:57:44 1708574264

Perhaps doing this to generate 10 similar but different versions of a model can then be fed into mixture of experts?

vessenes · 2024-02-22T07:39:50 1708587590

Ooh that’s a good idea! Although mistral seems to have been seeded with identical copies of mistral, so maybe it doesn’t buy you much? Sounds worth trying though!

SubiculumCode · 2024-02-22T15:26:06 1708615566

The deep problem of my life: I'm interested in so many things, but only have time to pursue one hobby and one neuroscience career. If it is indeed a good idea, its only from connecting gleaned generalizations with other gleaned generalizations; but the devil is often in the details; and I will never have enough time to try myself. :)

daxfohl · 2024-02-25T08:15:22 1708848922

Or a good way to teleport out of local minima while training. Create a few clones and take the one with the steepest gradients.

namibj · 2024-02-23T18:05:03 1708711503

Hmmm, I could think of using it to update a DDPM with a conditioning input as the dataset expands from an RL/online process, without ruining the conditioning mechanism that's only trainable through the actual RL itself.

I.e., self-supervised training is done to produce semantically sensical results, and the RL-trained conditioning input steers to contextually useful results.

(Btw., if anyone has tips on how to not wreck the RL training's effort when updating the base model with the recently encountered semantically-valid training samples that can be used self-supervised, please tell. I'd hate to throw away the RL effort expended to aquire that much taking data for good self-supervised operation. It's already looking fairly expensive...)

daxfohl · 2024-02-25T01:00:27 1708822827

You could use this and try to tease out something similar to https://news.ycombinator.com/item?id=39487124, but for NNs instead of images. Maybe it's possible to have this NN diffusion model explain the pieces of the NN they generate and why parameters have those values.

If we can get that, then maybe we don't even need to train anymore; it'd be possible to start to generate NNs algorithmically.