Neural Network Diffusion

vessenes · 2024-02-21T23:24:10 1708557850

I wasn't sure if this paper was parody on reading the abstract. It's not parody. Two things stand out to me: first is the idea of distilling these networks down into a smaller latent space, and then mucking around with that. That's interesting, and cross-sections a bunch of interesting topics like interpretability, compression, training, over- and under-.. The second is that they show the diffusion models don't just converge on similar parameters as the ones they train against/diffuse into, and that's also interesting.

I confess I'm not sure what I'd do with this in the random grab bag of Deep Learning knowledge I have, but I think it's pretty fascinating. I might like to see a trained latent encoder that works well on a bunch of different neural networks; maybe that thing would be a good tool for interpreting / inspecting.

daxfohl · 2024-02-22T03:29:29 1708572569

Seems like it could be useful for resizing the networks, no? Start with ChatGPT 4 then release an open version of it with much fewer parameters.

Or maybe some metaparameter that mucks with the sizes during training produces better results. Start large to get a baseline, then reduce size to increase coherence and learning speed, then scale up again once that is maxed out.

SubiculumCode · 2024-02-22T03:57:44 1708574264

Perhaps doing this to generate 10 similar but different versions of a model can then be fed into mixture of experts?

vessenes · 2024-02-22T07:39:50 1708587590

Ooh that’s a good idea! Although mistral seems to have been seeded with identical copies of mistral, so maybe it doesn’t buy you much? Sounds worth trying though!

SubiculumCode · 2024-02-22T15:26:06 1708615566

The deep problem of my life: I'm interested in so many things, but only have time to pursue one hobby and one neuroscience career. If it is indeed a good idea, its only from connecting gleaned generalizations with other gleaned generalizations; but the devil is often in the details; and I will never have enough time to try myself. :)

daxfohl · 2024-02-25T08:15:22 1708848922

Or a good way to teleport out of local minima while training. Create a few clones and take the one with the steepest gradients.

namibj · 2024-02-23T18:05:03 1708711503

Hmmm, I could think of using it to update a DDPM with a conditioning input as the dataset expands from an RL/online process, without ruining the conditioning mechanism that's only trainable through the actual RL itself.

I.e., self-supervised training is done to produce semantically sensical results, and the RL-trained conditioning input steers to contextually useful results.

(Btw., if anyone has tips on how to not wreck the RL training's effort when updating the base model with the recently encountered semantically-valid training samples that can be used self-supervised, please tell. I'd hate to throw away the RL effort expended to aquire that much taking data for good self-supervised operation. It's already looking fairly expensive...)

daxfohl · 2024-02-25T01:00:27 1708822827

You could use this and try to tease out something similar to https://news.ycombinator.com/item?id=39487124, but for NNs instead of images. Maybe it's possible to have this NN diffusion model explain the pieces of the NN they generate and why parameters have those values.

If we can get that, then maybe we don't even need to train anymore; it'd be possible to start to generate NNs algorithmically.

gwern · 2024-02-22T03:44:04 1708573444

This doesn't seem all that impressive when you compare it to earlier work like 'g.pt' https://arxiv.org/abs/2209.12892 Peebles et al 2022. They cite it in passing, but do no comparison or discussion, and to my eyes, g.pt is a lot more interesting (for example, you can prompt it for a variety of network properties like low vs high score, whereas this just generates unconditionally) and more thoroughly evaluated. The autoencoder here doesn't seem like it adds much.

vagabund · 2024-02-21T19:31:44 1708543904

Author thread: https://twitter.com/liuzhuang1234/status/1760195922502312197

squigz · 2024-02-21T22:09:25 1708553365

Is there any sites for viewing Twitter threads without signing up?

f_devd · 2024-02-21T22:21:58 1708554118

https://nitter.esmailelbob.xyz/liuzhuang1234/status/17601959...

(bit of trial and error from https://github.com/zedeus/nitter/wiki/Instances)

falcor84 · 2024-02-21T22:03:14 1708552994

Seems like we're getting very close to recursive self-improvement [0].

[0] https://www.lesswrong.com/tag/recursive-self-improvement

astrange · 2024-02-21T23:01:46 1708556506

No, this is an example of an existing technique called hypernetworks.

It's not "recursive self improvement", which is just a belief that magic is real and you can wish an AI into existence. In particular, this one needs too much training data, and you can't define "improvement" without knowing what to improve to.

FeepingCreature · 2024-02-22T07:51:06 1708588266

All current LLMs are based on the premise that magic is real and you can wish intelligence into existence; it's called "scaling laws" and "emergent capabilities".

Recursive self-improvement isn't "maybe magic is real", it's "maybe the magic we already know about stays magical as we cast our spells with more mana."

z7 · 2024-02-22T10:01:14 1708596074

Doesn't this line of reasoning imply that human intelligence is magical, i.e. is not the result of scaling/emergence?

FeepingCreature · 2024-02-22T12:10:55 1708603855

Reality is never magical, by definition. Magic just means that we are using something without understanding it.

Whatever our brains are doing internally isn't magical. But it's magic to us because we don't know how it works. So too with current LLMs.

My point is we're already doing things with LLMs that we don't understand and that we didn't think were attainable until two years ago. We don't know how to do superintelligence and recursive self-improvement... but we're off in uncharted territory already, and I think there's a lot more grounds for positive uncertainty about self-improvement than there was before GPT-3.

actionfromafar · 2024-02-22T11:22:49 1708600969

The answer is a firm maybe, depending on some factors.

killerstorm · 2024-02-22T00:16:29 1708560989

> which is just a belief that magic is real

Is there a law of thermodynamics which prevents AI from writing code which would train a better AI? Never learned that one in school.

And FYI here's OpenAI plan to align superintelligence: "Our goal is to build a roughly human-level automated alignment researcher. We can then use vast amounts of compute to scale our efforts, and iteratively align superintelligence."

I guess people working there believe in magic.

> and you can wish an AI into existence.

Eh? People believe that self-improvement might happen when AI is around human-level.

astrange · 2024-02-22T01:03:57 1708563837

> Is there a law of thermodynamics which prevents AI from writing code which would train a better AI?

You need to apply Wittgenstein here.

This appears to be true because you haven't defined "better". If you define it, it'll become obvious that this is either false or true, but if it is true it'll be obvious in a way that doesn't make it sound interesting anymore.

(For one thing our current "AI" don't come from "writing code", they just come from training bigger models on the same data. For another, making changes to code doesn't make it exponentially better, and instead breaks it if you're not careful.)

> I guess people working there believe in magic.

Yes, OpenAI was literally founded by a computer worshipping religious cult.

> People believe that self-improvement might happen when AI is around human-level.

Humans don't have a "recursive self-improvement" ability.

Also not obvious that an AI that was both "aligned" and "capable of recursive self-improvement" would choose to do it; if you're an AI and you're making a new improved AI, how do you know it's aligned? It sounds unsafe.

Nevermark · 2024-02-22T06:21:51 1708582911

> Humans don't have a "recursive self-improvement" ability

They do.

Humans can learn from new information, but also by iteratively distilling existing information or continuously optimizing performance on an existing task.

Mathematics is a pure instance of this, in the sense that all the patterns for conjectures and proven theorems are available to any entity to explore, no connection to the world needed.

But any information being analyzed for underlying patterns, or task being optimized for better performance, creates a recursive learning driver.

Finally, any time two or more humans compete at anything, they drive each other to learn and perform better. Models can do that too.

alanbernstein · 2024-02-22T02:08:41 1708567721

> they just come from training bigger models on the same data

Are you arguing that all AI models are using the same network structure?

This is only true in the most narrow sense, looking at models that are strictly improvements over previous generation models. It ignores the entire field of research that works by developing new models with new structures, or combining ideas from multiple previous works.

astrange · 2024-02-22T03:39:59 1708573199

I sure am ignoring that, because the bitter lesson of AI is usually applicable and implies that all such research will be replaced by larger generic transformer networks as time goes on.

The exception is when you care about efficiency (in training or inference costs) but at the limit or if you care about "better" then you don't.

recursivecaveat · 2024-02-22T07:32:30 1708587150

This is kindof an odd statement because the transformer is not the most generic neural net. It's the result of many levels of improvements in architecture over older designs. The bitter lesson is methods that can scale well with compute win (alpha/beta beats heuristics alone, neural network beats alpha/beta), not that the most obvious and generic approach eventually wins. Given the context-length problems with transformers I think it's fair to say they have scaling problems.

killerstorm · 2024-02-22T08:10:34 1708589434

There's a principle more powerful than the bitter lesson: GIGO.

Training to predict internet dump can only give you so much.

There's a paper called something like "learning from textbooks" where they show that a small model trained on high-quality no-nonsense dataset can beat a much bigger model at a task like Python coding.

Tenobrus · 2024-02-22T01:09:46 1708564186

it is very clear to me that humans do in fact have a recursive self-improvement ability, and i'm confused why you think otherwise

astrange · 2024-02-22T02:00:37 1708567237

I think people can read books (self improvement) and have children (recursive), but neither of those are both.

lucubratory · 2024-02-22T02:26:23 1708568783

Why do you think that the human population is more intelligent, knowledgeable, and achieves greater technological feats as time goes on? It's because of recursive self-improvement, we are raised and educated into being better in a quite general sense, which includes being better at raising and educating; nearly every generation this cycle repeats and has for all of human history, at least since we acquired language. We also build machines that help us to make better machines, and then we use those better machines to make even better machines, another example of recursive self-improvement.

webmaven · 2024-02-24T15:09:42 1708787382

You're pointing out that groups/institutions/cultures/civilizations are examples of recursively self-improving entities, but the original point was about a recursively self-improving individual intelligent entity.

Well, to the extent that a human-level intelligence is an individual, anyway. We ourselves are probably a mixture-of-experts in some sense.

lucubratory · 2024-02-24T21:22:15 1708809735

An individual human starts out a mewling baby and can end up a maxillofacial surgeon through at least partial examples of recursive self-improvement. Learn to walk, talk, read, write, structure, argue, essay, study, cite etc all the way through to the end, with what you previously learned allowing you to learn even more. There's a huge amount of outside help, but at least some of it is also self-improvement.

Also, for the purposes of talking about the phenomenon of recursive self-improvement, individual vs society isn't the end of analysis. Part of the reason AI recursive self-improvement is concerning is that people are worried about it happening on much faster than societal timescales, in ways that are not socially tractable like human societies are (e.g. if our society is "improving" in a way we don't like, we or other humans can intervene to prevent, alter, or mitigate it). It's also important to note that when we're talking about "recursive self-improvement" when it comes to AI, the "self" is not a single software artifact like Llama-70B. The "self" is AI in general, and the most common proposed mechanism is that an AI is better than us at designing and building AIs, and the resulting AI it makes us even better at designing and building AIs.

rralian · 2024-02-22T02:31:41 1708569101

New generations build onto the scientific knowledge of previous generations. It may not be fast but that sounds like recursive improvement to me. It seems reasonable for AI to accelerate this process.

astrange · 2024-02-22T03:35:27 1708572927

I think saying all of society is doing it is plausible, but not the same thing as a single human or AI doing it.

Though… still don't think it's true. Isn't "society is self improving" what they call Whig history?

killerstorm · 2024-02-22T08:14:50 1708589690

AI might have multiple instances within a single computing environment, so it's more like a population than a single individual.

I.e. "You can only use the memory which you currently use" would be a weird artificial constraint not relevant in practice.

spacecadet · 2024-02-22T02:18:40 1708568320

A very small percentage maybe. I think I agree with the notion that most people bias toward thinking they are improving while actually self-sabotaging.

stale2002 · 2024-02-22T07:13:23 1708586003

> If you define it, it'll become obvious that this is either false or true

Ok. So then I guess it isn't "just a belief that magic".

Instead, it is so true and possible that you think it is actually obvious!

I'm glad you got convinced in a singular post that recursive self improvement, in the obvious way, is so true and real that it is obviously true and not magic.

killerstorm · 2024-02-22T08:03:31 1708589011

> This appears to be true because you haven't defined "better".

Better intelligence can be defined quite easily: something which is better at (1) modeling the world; (2) optimizing (i.e. solving problems).

But if that would be too general we can assume that general reasoning capability would be a good proxy for that. And "better at reasoning" is rather easy to define. Beyond general reasoning better AI might have access to wider range of specialized modeling tools, e.g. chemical, mechanical, biological modeling, etc.

> if it is true it'll be obvious in a way that doesn't make it sound interesting anymore.

Not sure what you mean. AI which is better at reasoning is definitely interesting, but also scary.

> they just come from training bigger models on the same data.

I don't think so. OpenAI refuses to tell us how they made GPT-4. I think a big part of it was preparing better, cleaner data sets. Google tells us that specifically improved Gemini's reasoning using specialized reasoning datasets. More specialized AI like AlphaGeometry use synthetic datasets.

> Yes, OpenAI was literally founded by a computer worshipping religious cult.

Practice is the sole criterion for testing the truth. If their beliefs led them to better practice then they are closer to truth than whatever shit you believe in. Also I see no evidence of OpenAI "worshipping" anything religion-like. Many people working there are just excited about possibilities.

> Humans don't have a "recursive self-improvement" ability.

Human recursive self-improvement is very slow because we cannot modify our brains' at will. Also spawning more humans takes time. And yet humans made huge amount of progress in the last 3000 years or so.

Imagine that instead of making a new adult human in 20 years you could make one in 1 minute with full control over neural structures, connections to external tools via neural links, precisely controlled knowledge & skills, etc.

always2slow · 2024-02-22T02:47:33 1708570053

>> I guess people working there believe in magic.

>Yes, OpenAI was literally founded by a computer worshipping religious cult.

What cult is this?

astrange · 2024-02-22T03:36:22 1708572982

HPMOR readers who live in group home polycules in Berkeley who think they need to invent a good computer god to stop the evil computer god.

killerstorm · 2024-02-22T08:19:01 1708589941

You're confusing OpenAI and MIRI.

OpenAI founders: Ilya Sutskever, Greg Brockman, Trevor Blackwell, Vicki Cheung, Andrej Karpathy, Durk Kingma, Jessica Livingston, John Schulman, Pamela Vagata, Wojciech Zaremba, Sam Altman. All of them come from software tech industry and academic research circles, not evidence of interest in HPMOR or Yud.

Vecr · 2024-02-22T03:40:13 1708573213

I think they cleaned out some of the EAs around the time of the board situation, but I don't know what the non-EA overlap is with your description.

koe123 · 2024-02-22T08:53:06 1708591986

> I guess people working there believe in magic.

I've been thinking about this recently. Personally, I've yet to see any compelling evidence that an LLM, let alone any AI, can operate really well "out of distribution". It's capabilities (in my experience) seem to be spanned by the data it's trained on. Hence, this supposed property that it can "train itself", generating new knowledge in the process, is yet to be proven in my mind.

That raises the question for me: why do OpenAI staff believe what they believe?

If I'm being optimistic, I suppose they may have seen unreleased tech, motivating their beliefs that seemingly AGI is on the horizon.

If I'm being cynical, the promise of AGI probably draws in much more investment. Thus, anyone with a stake in OpenAI has an incentive to promote this narrative of imminent AGI, regardless of how realistic it is technically.

This is of course just based on what I've seen and read, I'd love to see evidence that counter my claims.

killerstorm · 2024-02-22T11:16:51 1708600611

The question is not whether it can work right now, but whether it is possible in the future (i.e. whether it's possible in principle).

I think the concern about out-of-distribution is overstated. If we train it on predicting machine learning papers, writing machine learning papers is not out-of-distribution.

You might say "but writing NOVEL papers" would be OOD; but there's no sharp boundary between old and new. Model's behavior is usually smooth, so it's not like it will output random bs if you try to predict 2025 papers. And predicting 2025 papers in 2024 all we need to do "recursive self-improvement". (There are also many ways to shift distribution towards where you want it to be, e.g. aesthetics tuning, guidance in diffusion models, etc. Midjourney does not faithfully replicate distribution in the input training set, it's specifically tuned to create more pleasing outputs. So I don't see "oh but we don't have 2025 papers in the training set yet!" being an insurmountable problem.)

But more generally, seeing models as interpolators is useful only to some extent. We use statistical language when training the models, but that doesn't mean that all output should be interpreted as statistics. E.g. suppose I trained a model which generates a plausible proofs. I can combine it with proof-checker (which is much easier than generating a proof), and wrap it into a single function `generate_proof` which is guaranteed to generate a correct proof (it will loop until a plausible proof checks out). Now the statistics do not matter much. It's just a function.

If there's such a thing as a general reasoning step, then all we need is a function which perform that. Then we just add an outer loop to explore a tree of possibilities using these steps. And further improvements might be in making these steps faster and better.

Does reasoning generalize? I'd say everything points to "yes". Math is used in variety of fields. We are yet to find something where math doesn't work. If you get somebody educated in mathematical modeling and give them a new field to model, they won't complain about math being out-of-distribution.

If you look at LLMs today, they struggle with outputting JSON. It's clearly not an out-of-distribution problem, it's a problem with training - the dataset was too noisy, it had too many examples where somebody requests a JSON but gets a JSON-wrapped-in-Markdown. It's just an annoying data cleanup problem, nothing fundamental. I think it's reasonable to assume that within 5 years OpenAI, Google, etc, will manage to clean up their datasets and train more capable, reliable models which demonstrate good reasoning capabilities.

FWIW I believe that if we hit a wall on a road towards AGI that might actually be good to buy more time to research what we actually want out of AGI. But I doubt that any wall will last more than 5 years, as it already seems almost within the reach...

koe123 · 2024-02-22T14:30:43 1708612243

Interesting, I suppose what you're proposing is that models could, in some abstract way, extrapolate research results taking ideas A and B that it "knows" from its training, and using them to create idea AB. Then, we assert that there is some "validation system" that can be used to validate said result, thus creating a new data point, which can be retrained on.

I can see how such a pipeline can exist. I can imagine the problematic bit being the "validation system". In closed systems like mathematics, the proof can be checked with our current understanding of mathematics. However, I wonder if all systems have such a property. If, in some sense, you need to know the underlying distribution to check that a new data point is in said distribution, the system described above cannot find new knowledge without already knowing everything.

Moreover, if we did have such a perfect "validation system", I suppose the only thing the ML models are buying us is a more effective search of candidates, right? (e.g., we could also just brute force such a "validation system" to find new results).

Feel free to ignore my navel-gazing; it's fascinating to discuss these things.

rdedev · 2024-02-22T00:43:52 1708562632

Even if recursive self improvement does work out my hunch is that is going to be logarithmic instead of exponential mostly down to just availability of data. It might go beyond human intelligence but I don't think it will reach singularity

lucubratory · 2024-02-22T02:28:47 1708568927

This is why the big bet for AI-assisted AI-development long term is synthetic data. A big part of the reason so much money and resources is going into synthetic data right now is not just out of economic necessity, but because there have been extremely encouraging results with synthetic data (e.g. 'Textbooks Are All You Need', AlphaZero).

rdedev · 2024-02-22T07:22:45 1708586565

I wouldn't count aplha zero since it's reinforcement learning. That technique you can generate high quality data all the time since the rules are fixed. Not everything can be trained using that way

lucubratory · 2024-02-22T08:36:48 1708591008

The chess knowledge and skills of LLMs comes from them ingesting a sufficient number of chess games in text format (the amount will be proportional to both other data you have and the compute you have), same with the ability of LLMs to play other games or solve other fixed rule/perfect knowledge puzzles. AlphaZero and its cousins showed that you can generate an effectively infinite quantity of extremely high-quality data in those domains. There is a possibility that the benefit to an LLM's general intelligence from giving it e.g. one billion ~4600 ELO level games is only in improving its ability to play chess. Given the results many studies have reported in cross-learning with LLMs, I doubt that though. The potential is that generating a lot of extremely high level logic and puzzle solving and providing it as extremely high quality synthetic data to an LLM can improve its general reasoning and logic capabilities - that would be huge, and is one of the promises of synthetic data.

advael · 2024-02-22T00:23:24 1708561404

To be honest, I think a lot of smart people are willing to believe in magic when they've demonstrated some strong capability and the people funding their company want magic to happen.

killerstorm · 2024-02-22T00:28:21 1708561701

It's not magic, though. If AI can do work of a human, it can do work of a human. It's a trivial statement, and inability to see it is a hard cope.

Are you gonna to take a bet "AI won't be able to do X in 10 years" for some X which people can learn to do now? If you're unwilling to bet then you believe that AI would plausibly be able to perform any human job, including job of AI researcher.

rdedev · 2024-02-22T00:41:51 1708562511

At the end of the day it can only get as far as the data it has. Let's say you want to make a drug that inhibits a protein. The AI can generate plausible drugs but to see if it actually works you need to test it in the lab and then on an animal etc. now you can have an AI that has a perfect understanding of how a drug interacts with a protein but wesuch data is not available in the first place. Without that you can't just simply scale gpt type models

xanderlewis · 2024-02-22T00:49:02 1708562942

‘Doing the work of a human’ is something that is very hard to define or quantify in many cases. You sound very confident, but you don’t address this at all; you simply assume it’s a given.

Relevant: https://www.jaakkoj.com/concepts/doorman-fallacy

advael · 2024-02-22T00:58:15 1708563495

Yea, I think we're in kind of a vicious recursive cycle of imperfect-metric-reinforcement (reward-hacking I suppose, though often implemented in economics as well as code) rather than one of recursive self-improvement in a more holistic sense. Optimization is really good at turning small problems of this nature into big ones more quickly

advael · 2024-02-22T00:29:19 1708561759

I don't claim it's impossible, just that there isn't a clear path from what exists now to that reality, and that the explanation presented by the above commenter (and I suppose OpenAI's website) does not clarify what they think the path is

hackerlight · 2024-02-22T02:11:19 1708567879

What will AutoGPT look like if we have 100x more compute and another 10 years of research breakthroughs? It will be pretty damn good. If it can do the cognitive work of an AI researcher, well, there's your recursive self-improvement, at least on the research front (not so much on the hardware/energy front, physical constraints are trickier and will slow down progress in practice).

I don't know the exact path there, because if I did I'd publish and win the Turing Award. But it seems to be a plausible outcome in the medium-term future, at least if you go with Hinton's view that current methods are capable of understanding and reasoning, and not LeCun's view that it's all a dead end.

advael · 2024-02-22T03:24:14 1708572254

I won't comment on whether I believe those researchers hold those views as you describe them, but as you describe them, I think both those descriptions of the state of AI research are untrue. The capabilities demonstrated by transformer models seem necessary but not sufficient to understand and reason, meaning that while they're not necessarily a "dead end", it is far from guaranteed that adding more compute will get them there

Of course if we allow for any arbitrary "research breakthrough" to happen then any outcome that's physically possible could happen, and I agree with you that superhuman artificial intelligence is possible. Nonetheless it remains unclear what research breakthroughs need to happen, how difficult they will be, and whether handing a company like OpenAI lots of money and chips will get that done, and it remains even more unclear whether that is a desirable outcome, given that the priorities of that company seem to shift considerably each time their budget is increased (As is the norm in this economic environment, to be clear, that is not a unique problem of OpenAI)

Obviously OpenAI has every reason to claim that it can do this and to claim that it will use the results in a way designed to benefit humanity as a whole. The people writing this promotional copy and the people working there may even believe both of these things. However, based on the information available, I don't think the first claim is credible. The second claim becomes less credible the more of the company's original mission gets jettisoned as its priorities align more to its benefactors, which we have seen happen rather rapidly

killerstorm · 2024-02-22T08:29:58 1708590598

We can reason about it without knowing the path. E.g. somebody in 1950s could say "If you have enough compute you can do photorealistic quality computer graphics". If you ask them how to build a GPU they won't know. Their statement is about principal possibility.

advael · 2024-02-22T19:33:38 1708630418

Yes, and there are lots of predictions about when that would happen that turned out to be very wrong. Even if there is a clear path and specific people assigned to do a thing, it is famously always more difficult than expected for those people to correctly estimate how long it will take. Forgive me for being skeptical of random laypeople giving me timelines for an unknown unknown based on an objective that is ill-specified for work being done on something for which there is currently a lot of marketing hype

woopsn · 2024-02-22T00:34:16 1708562056

They do. Altman is saying their tech may be poised to capture the sum of all value in Earth's future light cone.

Saying "well that is not physically impermissible" doesn't make it real.

In any case nobody has ever shown that recursive self-improvement "takes off", and nor is that what we should expect a priori.

mattnewton · 2024-02-21T22:26:53 1708554413

I upvoted because this was my first thought too, but reading the abstract and skimming the paper makes me think it’s not really an advance for general recursive improvement. I think the title makes people think this is a text -> model model, when it is really a bunch of model weights -> new model weights optimizer for a specific architecture and problem. Still a potentially very useful idea for learning from a bunch of training runs and very interesting work!

fnordpiglet · 2024-02-21T22:39:01 1708555141

I suspect this is useful for porting one vector space to another which is an open problem when you’ve trained one model with one architecture and need to port it to another architecture without paying the full retraining cost.

GuB-42 · 2024-02-21T22:38:10 1708555090

Doesn't look that different from what we are already doing. For example AlphaGo/AlphaZero/MuZero learn to play board games by playing repeatedly against itself, it is a self improvement loop leading to superhuman play. It was a major breakthrough for the game of Go, and it lead to advances in the field of machine learning, but we are still far from something resembling technological singularity.

GANs are another example of self-improvement. It was famous for creating "deep fakes". It works by pitting a fake generator and a fake detector against each other, resulting in a cycle of improvement. It didn't get much further than that, in fact, it is all about attention and transformers now.

This is just a way of optimizing parameters, it will not invent new techniques. It can say "put 1000 neurons there, 2000 there, etc...", but it still has to pick from what designers tell it to pick from. It may adjust these parameters better than a human can, leading to more efficient systems, I expect some improvement to existing systems, but not a breaking change.

pests · 2024-02-22T00:06:40 1708560400

Go and Chess still has rules that are hard coded which at least gives a framework to optimize in. What rules do you give an LLM?

drdeca · 2024-02-22T01:10:38 1708564238

Some sort of "generate descriptions of novel tasks including ways to evaluate performance at those tasks, evaluate quality of the generated tasks+evaluation-metrics, split tasks into subtasks, estimate difficulty of tasks in a way that is is judged on how it compares to a combined estimated difficulty of generated subtasks and to actual success rate and quality" sort of deal?

spangry · 2024-02-22T01:09:14 1708564154

Physics.

faceplanted · 2024-02-22T12:52:14 1708606334

I'm skeptical of the idea that anything is going to derive intelligence from the bottom up, but I'll be super impressed if that's how it goes.

spangry · 2024-02-24T15:46:21 1708789581

Why not? We started off as single celled organisms and look at where we are now.

AgentME · 2024-02-22T09:19:37 1708593577

The real magic of recursive self improvement happens only after you have human-level AI that is able to match and surpass human ability in designing AI architectures. Escape-velocity-breaking recursive self improvement doesn't look like a human-made architecture being trained further, it looks like an AI understanding why transformers/etc were successful and coming up with an advancement over transformers.

philsnow · 2024-02-22T00:37:13 1708562233

A rare opportunity for the other four-letter comic to be applicable: http://smbc-comics.com/comic/2011-12-13

(Though I suppose this skips Neuralink / step 3 and jumps right to step 4.)

bamboozled · 2024-02-21T22:48:12 1708555692

The ai is ready to take off to perfection land

goggy_googy · 2024-02-21T22:53:37 1708556017

"We synthesize 100 novel parameters by feeding random noise into the latent diffusion model and the trained decoder." Cool that patterns exist at this level, but also, 100 params means we have a long way to go before this process is efficient enough to synthesize more modern-sized models.

Scene_Cast2 · 2024-02-21T22:41:33 1708555293

Yay, an alternative to backprop & SGD! Really interesting and impressive finding, I was surprised that the network generalizes.

justanotherjoe · 2024-02-22T08:57:49 1708592269

fuck. I have an idea just like this one. I guess it's true that ideas are a dime a dozen. Diffusions bear a remarkable similarity to backpropagation to me. I thought that it could be used in place of it for some parts of a model.

Furthermore, I posit that resnet especially in transformers allows the model into a more exploratory behavior that is really powerful, and is a necessary component of the power of transformers. Transformers is just such a great architecture the more i think about it. It's doing so many things so right. Although this is not really related to the topic.

crotchfire · 2024-02-22T09:14:59 1708593299

Actually it is related.

Transformers are just networks that learn to program the weights of other networks [1]. In the successful cases the programmed network has been quite primitive -- merely a key-value store -- in order to ensure that you can backpropagate errors from the programmed network's outputs all the way to the programmer network's inputs.

The present work extends this idea to a different kind of programmed network: a convolutional image-processing network.

There are many more breakthroughs to be achieved along this line of research -- it is a rich vein to mine. I believe our best shot at getting neural networks to do discrete math and symbolic logic, and to write nontrivial computer programs, will result from this line of research.

[1] https://arxiv.org/abs/2102.11174

goggy_googy · 2024-02-21T22:47:15 1708555635

Important to note, they say "From these generated models, we select the one with the best performance on the training set." Definitely potential for bias here.

nerdponx · 2024-02-21T23:46:26 1708559186

I'd have liked to see the distribution of generated model performance.

QuadmasterXLII · 2024-02-22T00:23:40 1708561420

Fig 4b

marojejian · 2024-02-22T16:35:53 1708619753

Am i missing something, or is this just a case of "amortized inference", where you train a model (here a diffusion one), to infer something that was previously found via optimization procedure? (here NN parameters).

jackblemming · 2024-02-21T23:09:30 1708556970

The state of art neural net architecture, whether that be transformers or the like, trained on self play to optimize non-differentiable but highly efficient architectures is the way.

hackerlight · 2024-02-21T23:59:39 1708559979

According to Hinton, before transformers were shown to work well, learning model architectures was Google's main focus

hoc · 2024-02-22T06:03:37 1708581817

Hm, so does this actually improve/condense the representation for certain applications or is this some more some kind of global expand and collect in network space?

jarrell_mark · 2024-02-22T08:42:25 1708591345

Can this be used to fill in the missing information on the openworm nematode 302 neurons brain simulator?

amelius · 2024-02-21T22:32:48 1708554768

Why does Figure 7 not include a validation curve (afaict only the training curve is shown)?

nullc · 2024-02-22T06:47:08 1708584428

heh https://news.ycombinator.com/item?id=39208213#39211749

HanClinto · 2024-02-22T14:45:58 1708613158

hah, nice! :D

t_serpico · 2024-02-22T01:27:36 1708565256

i'd wager that adding noise to the weights in a principled fashion would accomplish something similar to this.

jerpint · 2024-02-22T01:39:56 1708565996

I would really be surprised if just adding noise would give you convergence