Ilya Sutskever NeurIPS talk [video]

zxexz · 2024-12-14T08:14:56 1734164096

I can't help but feel that this talk was a lot of...fluff?

The synopsis, as far as my tired brain can remember:

- Here's a brief summary of the last 10 years

- We're reaching the limit of our scaling laws, because we've trained on all the data we have available on the limit

- Some things that may be next are "agents", "synthetic data", and improving compute

- Some "ANNs are like biological NNs" rehash that would feel questionable if there was a thesis (which there wasn't? something about how body mass vs. brain mass are positively correlated?)

- 3 questions, the first was something about "hallucinations" and whether a model be able to understand if it is hallucinating? Then something that involved cryptocurrencies, and then a _slightly_ interesting question about multi-hop reasoning

coeneedell · 2024-12-14T13:40:51 1734183651

I attended this talk in person and some context is needed. He was invited for the “test of time” talk series. This explains the historical part of the talk. I think his general persona and association with ai led to the fluffy speculation at the end.

I notice with Ilya he wants to talk about these out there speculative topics but defends himself with statements like “I’m not saying when or how just that it will happen” which makes his arguments impossible to address. Stuff like this openly invites the crazies to to interact with him, as seen with the cryptocurrency question at the end.

Right before this was a talk reviewing the impact of GANs that stayed on topic for the conference session throughout.

mrbungie · 2024-12-14T19:14:01 1734203641

I mean he repeateadly gave some hints (even if just for the lulz and not seriously) that the audience is at least partially composed of people with little technical background or AI bros. An example is when he mentioned LSTMs and said "many of you may never seen before". Even if he didn't mean it, ironically it ended being spot on when the crypto question came.

Der_Einzige · 2024-12-15T04:42:00 1734237720

As someone who is at NeurIPS right now with a main conference paper, I was shocked at how many NeurIPS attendees had no paper. At ACL conferences, almost every person attending has a paper (even if it's only at a workshop)

NeurIPS is "ruined" by the money and thus attracts huge amounts of people who are all trying to get rich. It's a bloody academic conference people!

moneywoes · 2024-12-15T17:23:46 1734283426

how was the conference overall, did you learn anything?

Der_Einzige · 2024-12-15T19:18:17 1734290297

A lot and I will write up a huge response to this after I’ve processed it all…

throwaway2037 · 2024-12-15T01:42:47 1734226967

    > the audience is at least partially composed of people with little technical background or AI bros.

I have never seen the term "AI bros". What does it mean?

c0redump · 2024-12-15T01:59:12 1734227952

A person with little or no technical background, that neither knows nor cares about AI (or other scientific/mathematical advancements) other than their potential to make the AI bro rich. There is a big overlap with crypto bros, and in fact many AI bros are just grifters who moved on after crypto tanked with the recent fed funds rate hikes.

eddsolves · 2024-12-15T17:56:39 1734285399

> after crypto tanked with the recent fed funds rate hikes

When? It's at an all time high right now

paulddraper · 2024-12-15T19:09:41 1734289781

Recent as in 18 months ago.

Not recent as in the last 3 months.

Veedrac · 2024-12-15T08:20:28 1734250828

It's just someone who involves themselves in AI but as a slur.

dboreham · 2024-12-15T05:29:58 1734240598

Parent presumably coined the term but it's pretty clear what it means from similarly to techbro, brogrammer, cryptobro etc.

mrbungie · 2024-12-15T06:23:45 1734243825

Kinda, pretty sure I've read it somewhere else before. Plus, it has entries in Urban Dictionary from early 2023.

paulddraper · 2024-12-15T19:13:29 1734290009

Analogous to tech bro, crypto bro.

I.e. someone who has shallow technical understanding or independent thought, but is following trends in hopes of turning a profit.

ITjournalist · 2024-12-17T07:30:19 1734420619

"a talk reviewing the impact of GANs"

Is that available online?

killerstorm · 2024-12-14T15:34:56 1734190496

Well, it looks like the entire point was "you can no longer expect a capability gain from a model with a bigger ndim trained on a bigger internet dump".

That's just one sentence, but it's pretty important. And while many people already know this, it's important to hear Sutskever say this. So people know it's a common knowledge.

The rest is basically intro/outro.

lottin · 2024-12-14T19:40:52 1734205252

He makes a good point. But then he jumps to "models will be self-aware". I fail to see any logical connection.

killerstorm · 2024-12-14T21:22:22 1734211342

But they are self-aware, in fact it's impossible to make a good AI assistant which isn't: it has to know that it's an AI assistant, it has to be aware of its capabilities, limitations, etc.

I guess you're interpreting "self-awareness" in some mythical way, like a soul. But in a trivial sense, they are. Perhaps not to same extent as humans: models do not experience time in a continuous way. But given that it can maintain a dialogue (voice mode, etc), it seems to be phenomenologically equivalent.

lottin · 2024-12-14T23:10:02 1734217802

Would you say that a calculator is also self-aware, and that it must know its limitations so that it doesn't attempt to do calculations it isn't capable of doing, for instance?

killerstorm · 2024-12-15T01:27:31 1734226051

No. The difference is that LLMs can talk, they can use knowledge of themselves in many different ways.

benchmarkist · 2024-12-15T05:05:27 1734239127

And how do they do that? Be very specific.

killerstorm · 2024-12-15T11:15:58 1734261358

Alright. Suppose "meaning" (or "understanding") is something which exists in human head.

It might seem as a complete black box, but we can get some information about it by observing human interactions.

E.g. suppose Алиса does not know English, but she has a bunch of cards with instructions "Turn left", "Bring me an apple", etc. If she shows these cards to Bob and Bob wants to help her, Bob can carry out instructions in a card. If they play this game, the meaning which card induces in Bob's head will be understood by Алиса, thus she will be able to map these cards to meaning in her head.

So there's a way to map meaning which is mediated by language.

Now from math perspective, if we are able to estimate semantic similarity between utterances we might be able to embed them into a latent "semantic" space.

If you accept that the process of LLM training captures some aspects of meaning of the language, you can also see how it leads to some degree of self-awareness. If you believe that meaning cannot be modeled with math then there's no way anyone can convince you.

benchmarkist · 2024-12-15T15:43:22 1734277402

How does math encode meaning if there is no Alice and Bob? You should quickly realize the absurdity of your argument once you take people out of the equation.

killerstorm · 2024-12-15T16:40:57 1734280857

Alright, I wrote an article: https://killerstorm.github.io/2024/12/15/meaning.html

Please let me know which part you find absurd.

benchmarkist · 2024-12-15T17:00:15 1734282015

It's a good essay but you didn't address my point.

killerstorm · 2024-12-15T17:33:32 1734284012

Not sure what you mean... A NN training process can extract semantics from observations. That semantics can be then subsequently applied e.g. to robots. So it doesn't depend on humans beyond production of observations.

benchmarkist · 2024-12-15T17:41:02 1734284462

The function/mathematics in an NN (neural network) is meaningless unless there is an outside observer to attribute meaning to it. There is no such thing as a meaningful mathematical expression without a conscious observer to give it meaning. Fundamentally there is no objective difference between one instance of a NN with one parameter, f(θ), evaluated on some input, f(θ)(x), and another instance of the same network with a small perturbation of the parameter, f(θ+ε), evaluated on the same input, f(θ+ε)(x), unless a conscious observer perceives the output and attributes meaning to the differences because the arithmetic operations performed by the network are the same in both networks in terms of their objective complexity and energy utilization.

ben_w · 2024-12-15T19:14:19 1734290059

How does the universe encode meaning if there is no Alice and Bob?

One common answer is: it doesn't.

And yet, here we are, creating meaning for ourselves despite being a state of the quantum wave functions for the relevant fermion and boson fields, evolving over time according to a mathematical equation.

(Philosophical question: if the time-evolution of the wave functions couldn't be described by some mathematical equation, what would that imply?)

benchmarkist · 2024-12-15T20:47:43 1734295663

The universe does not have a finite symbolic description. Whatever meaning you attribute to the symbols has no objective reality beyond how people interpret those symbols. Same is true for the arithmetic performed by neural networks to flash lights on the screen which people interpret as meaningful messages.

ben_w · 2024-12-15T21:32:44 1734298364

> The universe does not have a finite symbolic description

Why do you believe that? Have you mixed up the universe with Gödel's incompleteness theorems?

Your past light cone is finite in current standard models of cosmology, and according to the best available models of quantum mechanics a finite light cone has a finite representation — in a quantised sense, even, with a maximum number of bits, not just a finite number of real-valued dimensions.

Even if the universe outside your past light cone is infinite, that's unobservable.

> Same is true for the arithmetic performed by neural networks to flash lights on the screen which people interpret as meaningful messages.

This statement is fully compatible with the proposition that an artificial neural network itself is capable of attributing meaning in the same way as a biological neural network.

It does not say anything, one way or the other, about what is needed to make a difference between what can and cannot have (or give) meaning.

benchmarkist · 2024-12-15T21:47:32 1734299252

[flagged]

dang · 2024-12-15T22:23:07 1734301387

We've banned this account for repeatedly breaking HN's guidelines.

Please don't create accounts to break HN's rules with.

https://news.ycombinator.com/newsguidelines.html

seanhunter · 2024-12-15T05:53:57 1734242037

I don’t understand what you consider self-awareness to be in this trivial sense. Is Eliza self-aware for example? Eliza maintains a dialogue albeit not obviously as coherent as a modern LLM.

From my perspective, we know what the benchmark is for our own self-awareness, because Decartes gave it to us: Cogito ergo sum. I think therefore I am. We know we think, so we know we exist. That is the root of our self-awareness.

There is a great deal of controversy about the question of whether any of the existing models think at all (and of course that’s the whole point of Turing’s amazing paper[1], and the Chinese room thought experiment[2]) and the best you could say is the burden at the moment is on the people who say models can think to prove that.

Given that, I really don’t see how you can say models have self-awareness at the moment. Models may hypothetically be able to convince themselves they are self-aware via Decartes’ method but notice that Decartes’ proof doesn’t work for us - he was able to pull himself up by his own bootstraps because he knew there was a thought so there must be an “I” who was doing the thinking. We have to observe models from the outside and determine whether or not thought is present, and that’s where the concept behind the Chinese room shows how tricky this is.

[1] Computing Machinery and Intelligence https://academic.oup.com/mind/article/LIX/236/433/986238

[2] https://en.wikipedia.org/wiki/Chinese_room

killerstorm · 2024-12-15T10:32:07 1734258727

The only kind of intelligence we know of since childhood is meat-based intelligence. Brain made out of flesh.

So our intuition tells us that flesh is essential. "Chinese room" appeals to this intuition.

But it's a circular argument.

Anyway, I believe they are self-aware, to some extent. Not like a human. But I have no arsenal to convince people who believe intelligence has to be made out of meat.

seanhunter · 2024-12-15T13:20:43 1734268843

I certainly don’t think intelligence has to be made out of meat but I do think it’s a complex set of questions and we owe it to ourselves to actually try to tackle the underlying issue of what intelligence and self-awareness etc really are and how we might know whether a model did or did not exhibit these, because we are apt to fool ourselves on both sides of this argument depending on our prior beliefs.

You can see the same thing when people talk about whether animals exhibit self-awareness. There are experiments with dolphins and mirrors for example that definitely suggest that dolphins recognise and might even be amused by their reflection when they see it, but some people find it very hard to reconcile themselves to the idea tgat a dolphin might have a sense of self. I personally find it harder to believe that any particular characteristic would be uniquely human.

_heimdall · 2024-12-15T06:10:40 1734243040

Self-awareness is a huge rabbit hole to go down. Its one if the many concepts we think make humans unique, at least in degree, but we never really found a clear way to define or identify it.

I have watched dogs, cats, cows, and chickens pretty extensively. I still couldn't tell you if they are really self-aware, and it ultimately it comes down to a definitional challenge of not having a clear line to draw and identify.

What makes you say LLMs are already self-aware, and how do you define it? And as long as an LLM is functionally a black box, how do you know it comprehends the idea that it is an LLM rather than having simply been trained on that token pattern or given that context as an instruction?

killerstorm · 2024-12-15T10:38:26 1734259106

Is it possible to tell intelligence from a simple lookup-table based script by asking questions? I think so.

If you ask one question, there's a chance it was in a lookup table.

If you ask multiple questions from an immense set of questions (like trillions of trillions of trillions..., sampled uniformly), and it answers all correctly, then it's either true intelligence or a lookup table which covers this whole immense space. (I'd argue there's no difference, as a process which makes this nearly-infinite table has to be intelligent.)

Same with self awareness - you can ask questions where it applies...

_heimdall · 2024-12-15T12:54:06 1734267246

In either case you are only looking at inputs and outputs. The results may correlate with what you'd consider self-awareness or intelligence, but can you really say that's what you are set ng without understanding how the system works internally.

LLMs are trained on a massive dataset and the resulting model is effectively a compressed representation of that dataset. On the surface you'd have no way of knowing whether the algorithm answering is in fact just a lookup table.

This issue feels very similar to scientific modelling vs controlled studies. Modelling may show correlation, but it will never be able to show causation. Asking a system a bunch of questions and getting the right answers is the same, you're just coming up with a sample set of modelling data and attempting to interpret how the system likely worked only by looking at inputs and outputs.

killerstorm · 2024-12-15T16:40:01 1734280801

When I talk with a person I can tell that they are intelligent and self-aware.

Claiming that self-awareness is unknowable concept is inherently unproductive.

People have been using "theory of mind" in practice for millenia so we have to assume it's good for something, otherwise we won't go anywhere. I don't think that knowing internals is important - I don't reach for a scalpel to get what a person means.

_heimdall · 2024-12-15T17:32:38 1734283958

When you're talking to a person, though, you also have an understanding of what a human is and what your own experience is.

Its reasonable to interact with another human and expect that they are roughly similar to you, especially when your interactions match what you'd expect.

That doesn't extend as well to other species, let along non-living things that are entirely different from us. They could seem intelligent from the outside but internally function like a lookup table. They also could externally seem like a lookup table while internally matching much better what wed consider intelligence. We don't have context of first hand experience that applies and we don't know what's going on inside the black box.

With all that said, I'm phrasing this way more certain than I mean to. I wouldn't claim to know whether a box is intelligent or not, I'm just trying to point out how hard or impossible it would be today without knowing more about the box.

ben_w · 2024-12-15T19:25:49 1734290749

> Its reasonable to interact with another human and expect that they are roughly similar to you, especially when your interactions match what you'd expect.

It is a default belief that most of us have. The more I learn, the less I think it is true.

Some people have no autobiographical memory, some are aphantasic, others are autistic; motivations can be based on community or individualism, power-seeking, achievements, etc.; some are trapped by fawning into saying yes to things when they want to say no; some are sadistic or masochistic; myself I am unusual for many reasons, including having zero interest in spectator sports and that I will choose to listen to music only rarely.

I have no idea if any AI today (including but not limited to LLMs) are conscious by most of the 40 different meanings of that word, but I do suspect that LLMs are self-aware because when you get two of them talking to each other, they act as if they know they're talking to another thing like themselves.

But that's only "I suspect", not even "I believe" or "I'm confident that", because I am absolutely certain that LLMs are fantastic mimics and thus I may only be seeing a cargo-cult version of self-awareness, a Clever Hans version, something that has the outward appearance but no depth.

_heimdall · 2024-12-15T21:48:15 1734299295

> It is a default belief that most of us have. The more I learn, the less I think it is true.

Sure, that's totally reasonable! It all depends on context - I think I'm safe to assume another human is more similar to me than an ant, but that doesn't mean all humans are roughly equivalent in experience. Even more important, then, that we can't assume a machine or an algorithm has developed similar experiences to us simply because they seem to act similarly on the surface.

I'm on the opposite side of the fence as you, I don't think or suspect that any LLMs or ML in general have developed self-awareness. That comes with the same big caveat that its just what I suspect though, and could be totally wrong.

killerstorm · 2024-12-15T18:18:40 1734286720

Well, I take this: https://x.com/GrantSlatton/status/1600388425651453953

as evidence that GPT-4 can understand Python, based on assumptions:

1. You cannot execute non-trivial programs without understanding computation/programming language 2. It's extremely unlikely that these kind of programs or outputs are available anywhere on the internet - so at very least GPT-4 was able to adapt extremely complex patterns in a way which nobody can comprehend 3. Nobody explicitly coded this, this capability have arisen from SGD-based training process

_heimdall · 2024-12-15T19:00:42 1734289242

That's an interesting one, I'll have to think through that a bit more.

Just first thoughts here, but I don't think (2) is off the table. The model wouldn't necessarily have to have been trained on the exact algorithm and outputs. Forcing the model to work a step at a time and show each step may push the model into a spot where it doesn't comprehend the entire algorithm but it has broken the work down to small enough steps that it looks similar enough to python code it was trained on that it can accurately predict the output.

I'm also assuming here that the person posting it didn't try a number of times before GPT got it right, but they could have cherry picked.

More importantly, though, we still have to assume this output would require python comprehension. We can't inspect the model as it works and don't know what is going on internally, it just appears to be a problem hard enough to require comprehension.

gamegoblin · 2024-12-16T03:03:40 1734318220

Author of the tweet here. A few notes:

1. No cherry picking

2. This was the original ChatGPT, i.e. the GPT3.5 model, pre-GPT4, pre-turbo, etc

3. This capability was present as early as GPT3, just the base model —- you'd prompt it like "<python program> Program Output:" and it would predict the output

staticman2 · 2024-12-15T00:33:38 1734222818

Must my Search Engine be "self aware" to return images, not text when I do an image search?

amenhotep · 2024-12-15T11:02:29 1734260549

The tendency of your type to immediately accuse anyone who acknowledges that they have subjective awareness of themselves and that this is meaningful with "aha, evidently you believe in magical souls" is very telling, I think.

leobg · 2024-12-15T18:17:06 1734286626

What looks like “self awareness” is baked in during instruction tuning:

Basically turning any input into a document completion task, giving lots of examples where the completion contains phases like “I am an AI assistant”. This way, if GPT-3 would have completed your question with more questions that are similar, “assistants” will complete it with an answer, and one that sounds like it was spoken by someone who claims to be an AI assistant.

shadowerm · 2024-12-16T13:04:40 1734354280

He is just giving the audience what it wants to hear.

There is a huge collective denial of how bad progress is stalling out because the economy and markets have basically shoved all in on AGI.

There is really no option here but to keep the poker face and bluff going , hoping to not get called as long as possible.

mrbungie · 2024-12-14T19:17:08 1734203828

It is very important to have at least some kind of counterweight vs OpenAI/sama predicting AGI for 2025/2026.

throwaway314155 · 2024-12-14T22:40:46 1734216046

Ilya very much had the same optimism about AGI during his time at OpenAI from my understanding.

KuriousCat · 2024-12-14T22:14:48 1734214488

So, are we headed towards a bitter-sweet zone and modeling is going to get more prominent once again? massive datasets going to take a backseat?

throwaway2037 · 2024-12-15T01:40:56 1734226856

Thank you to summarise the video. No trolling here: I am surprised that no one asked an LLM to summarise the video, then post the result here as a comment (with LLM "warning" of course).

benchmarkist · 2024-12-15T05:39:32 1734241172

Here you go: https://g.co/gemini/share/852449bad133

Havoc · 2024-12-16T01:56:15 1734314175

> fluff

I’d imagine even for the best of minds it’s hard to always come up with something profound on demand

bflesch · 2024-12-14T23:37:52 1734219472

For me the questions were a big red flag. Fluff questions about crypto, human rights for AI and then "autocorrect" for AI. Obviously people who ask questions at conference talks are a special type of person but these topics scream to me that there's so many grifters in the AI space right now it might as well drown authentic research.

By now, most of the fundamental contributors are multi-millionaires with cushy contracts. Various labs & departments have their big fat funding for AI research topics. They will be able to spend next 10 years on synthetic data, or "agents", or ensuring that no breasts are in auto-generated images; but somehow it doesn't feel to me like there'll be a lot of fundamental progress.

/remindme 10 years

skissane · 2024-12-14T02:59:45 1734145185

> “Pre-training as we know it will unquestionably end,” Sutskever said onstage.

> “We’ve achieved peak data and there’ll be no more.”

> During his NeurIPS talk, Sutskever said that, while he believes existing data can still take AI development farther, the industry is tapping out on new data to train on. This dynamic will, he said, eventually force a shift away from the way models are trained today. He compared the situation to fossil fuels: just as oil is a finite resource, the internet contains a finite amount of human-generated content.

> “We’ve achieved peak data and there’ll be no more,” according to Sutskever. “We have to deal with the data that we have. There’s only one internet.”

What will replace Internet data for training? Curated synthetic datasets?

There are massive proprietary datasets out there which people avoid using for training due to copyright concerns. But if you actually own one of those datasets, that resolves a lot of the legal issues with training on it.

For example, Getty has a massive image library. Training on it would risk Getty suing you. But what if Getty decides to use it to train their own AI? Similarly, what if News Corp decides to train an AI using its publishing assets (Wall Street Journal, HarperCollins, etc)?

fidotron · 2024-12-14T13:05:11 1734181511

> What will replace Internet data for training? Curated synthetic datasets?

My take is that the access Meta, Google etc. have to extra data has reduced the amount of research into using synthetic data because they have had such a surplus of it relative to everyone else.

For example, when I've done training of object detectors (quite out of date now) I used Blender 3D models, scripts to adjust parameters, and existing ML models to infer camera calibration and overlay orientation. This works amazingly well for subsequently identifying the real analogue of the object, and I know of people doing vehicle training in similar ways using game engines.

There were several surprising tactical details to all this which push the accuracy up dramatically and you don't see too widely discussed, like ensuring that things which are not relevant are properly randomized in the training set, such as the surface texture of the 3D models. (i.e. putting random fractal patterns on the object for training improves how robust the object detector is to disturbance in reality).

menaerus · 2024-12-14T08:28:55 1734164935

> What will replace Internet data for training? Curated synthetic datasets?

Perhaps a different take at this could be: if I wanted to train "state law" LLM that is exceedingly good in interpreting state law, what are the obstacles to download all the law and regulations material for given state that will allow me to train LLM such that it becomes 95th percentile of all law trainees and lawyers.

In that case, and my point is, that we already don't need an "Internet". We just need a sufficiently sized and curated domain-specific dataset and the result we can get is already scary. "State law" LLM was just an example but the same logic applies to basically any other domain - want a domain-specific (LLM) expert? Train it.

losvedir · 2024-12-14T14:19:41 1734185981

That's kind of going in a different direction. The big picture is that LLMs have until this point gotten better and better from larger datasets alone. See "The Bitter Lesson". But now we're running out of datasets and so the only way we know of to improve models' reasoning abilities and everything, is coming to an end.

You're talking about fine tuning, which yes is a technique that's being used and explored in different domains, but my understanding is it's not a very good way for models to acquire knowledge. Instead larger context windows and RAG works better for something like case law. Fine tuning works for things like giving models a certain "voice" in how they produce text, and general alignment things.

At least that's my understanding as an interested but not totally involved follower of this stuff.

kranke155 · 2024-12-14T18:36:58 1734201418

A human being doesn’t need to read the entire internet to pass the state bar.

Seems to me that need new ideas?

pas · 2024-12-14T10:10:31 1734171031

you need context for the dry statutes

sure, you download all the legal arguments, and hope that putting all this on top of a general LLM which has enough context to deal with usual human, American, contemporary stuff

the argument is that it's not really enough for the next jump (as it would need "exponentially" more data) as far as a I understand

menaerus · 2024-12-14T10:25:49 1734171949

I don't understand the limitation, e.g. how much data do you need to train the "law state" specific LLM that doesn't know anything else but that?

Such LLM does not need to have 400B of parameters since it's not a general knowledge LLM but perhaps I'm wrong on this (?). So my point rather is that it may very well be, let's take for example, a 30B parameters LLM which in turn means that we might have just enough data to train it. Larger contexts in smaller models are a solved problem.

petesergeant · 2024-12-14T13:38:19 1734183499

> how much data do you need to train the "law state" specific LLM that doesn't know anything else but that?

Law doesn’t exist in a vacuum. You can’t have a useful LLM for state law that doesn’t have an exceptional grounding in real world objects of mechanics.

You could force a bright young child to memorize a large text, but without a strong general model of the world, they’re just regurgitating words rather than able to reason about it.

menaerus · 2024-12-14T14:06:16 1734185176

Counter-argument: code does not exist in vacuum yet we have small and mid-sized LLMs that can already output reasonable code.

mkoryak · 2024-12-14T15:26:50 1734190010

I'm going to push back on "produce reasonable code".

I've seen reasonable code written by AI, and also code that looks reasonable but contains bugs and logic errors that can be found if you're an expert in that type of code.

In other words, I don't think we can rely solely on AI to write code.

skissane · 2024-12-15T00:07:27 1734221247

I've seen a lot of code written by humans that "looks reasonable but contains bugs and logic errors that can be found if you're an expert in that type of code".

mkoryak · 2024-12-19T00:11:26 1734567086

What does that have to do with the state of the AI?

petesergeant · 2024-12-14T14:46:47 1734187607

Generally they’ve been distilled from much larger models, but also, code is a much smaller domain than the law.

noirbot · 2024-12-14T15:32:12 1734190332

Code is both much smaller as a domain and less prone to the chaos of human interpretation. There are many factors that go into why a given civil or criminal case in court turns out how it does, and often the biggest one is not "was it legal". Giving a computer access to the full written history of cases doesn't give you any of the context of why those cases turned out. A judge or jury isn't going to include in the written record that they just really didn't like one of the lawyers. Or that the case settled because one of the parties just couldn't afford to keep going. Or that one party or the other destroyed/withheld evidence.

Generally speaking, your compiler won't just decide not to work as expected. Tons of legal decisions don't actually follow the law as written. Or even the precedent set by other courts. And that's even assuming the law and precedent are remotely clear in the first place.

zozbot234 · 2024-12-14T15:47:04 1734191224

A model that's trained on legal decisions can still be used to explore these questions, though. The model may end up being uncertain about which way the case will go, or even more strikingly, it may be confident about the outcome of a case that then is decided differently, and you can try and figure out what's going on with such cases.

noirbot · 2024-12-14T22:17:51 1734214671

But what value does that have? The difference between a armchair lawyer and a real actual lawyer is in knowing when something is legal/illegal, but unlikely to be seen that way in a court or brought to a favorable verdict. It's knowing which cases you can actually win, and how much it'll cost and why.

Most of that is not in scope of what an LLM could be trained on, or even what an LLM would be good at. What you're training in that case would be someone who's an opinion columnist or twitter poster. Not an actual lawyer.

menaerus · 2024-12-15T12:43:39 1734266619

The point is not in replacing all of the lawyers or programmers but rather that we will no longer need so many of them since a lot of their expertise is becoming a commodity today. This is a fact and there have been many many examples of that.

My friend who hasn't been trained for SQL, nor computer science at all, is now all of the sudden able to crunch through complex SQL queries because of the help he gets through LLMs. He, or more specifically his company, does not need to hire an extern SQL expert anymore since he can manage it himself. He will probably not write a perfect SQL but it's going to be more than good enough and that's actually all that it matters.

The same thing happened at much much smaller scale with Google Translate. 10 years ago we weren't able to read foreign language content. Today? It's not even a click-away because Chrome is doing it for you automatically so it has become a commodity to go and read any website we wish to.

So, the history already proved us that "real translators" and "real SQL experts" and "real XY experts" have been already replaced by their "armchair" alternatives.

noirbot · 2024-12-15T16:43:05 1734280985

But that ignores that the stakes of law are high enough that you often cannot afford to be wrong.

30 years ago, the alternative to Google Translate was buying a translation dictionary or hiring a professional, neither of which was things you'd do for something you didn't care much about. Yes, I can go look at a site/article that's in a language I don't speak and get it translated and generally get the idea of what it's saying. If I'm just trying to look at a restaurant's menu in another language, I'm probably fine. I probably wouldn't trust it if I had serious food allergies, or was trying to translate what I could legally take through customs. If you're having a business meeting about something, you're probably still hiring a real human translator.

Yes, stuff has become commodity-level, but that just broadens who can use it, assuming they can afford for it to be wrong, and for them to have no recourse if it is. Google Translate won't pay your hospital bills if you rely on it to know there aren't allergens in your food and it mistranslated things. ChatGPT won't do the overtime to fix the DB if it gives you a SQL command that accidentally truncates the entire Dev environment.

Almost everything around law on most countries doesn't have "casual usage" where you can afford to be wrong. Even the most casual stuff you may go to a lawyer about, such as setting up a will, is still something where if you try to just do it yourself, you can create a huge legal mess. I've known friends whose relatives "did their own research" and wrote their own wills and when they died, most of their estate's value was consumed in legal issues trying to resolve it.

As I said before - a legal LLM may be fine for writing opinion pieces or informing arguing on the internet, but messing up even basic stuff about the law can be insanely costly if it ends up mattering, and most people won't know what will end up mattering. Lawyers bill hundreds an hour, and bailing you out of decisions you made an LLM-deluded mess could easily take tens of hours.

menaerus · 2024-12-16T09:11:21 1734340281

The stakes of deploying a buggy code into the data center production code can easily cost millions of $$$ and yet we still see that one of the primary usages of LLMs today is exactly in the software engineering. Accountability exists in every domain so such argument doesn't make law any different than anything else. You will still have an actual human signing off the law interpretation or code pull-request. It will just happen that we will not going to need 10 people for that job but 1. And this is at this point I believe inevitable.

sharih · 2024-12-14T20:36:17 1734208577

legal reasoning involves applying facts to the law, and it needs knowledge of the world. the expertise of a professional is in picking the right/winning path based on their study of the law, the facts and their real world training. money is in codifying that to teach models to do the same

skissane · 2024-12-15T00:05:50 1734221150

> code is a much smaller domain than the law

I agree, but I'd add – code as a domain is a lot more vast than any AI can currently handle.

AIs do well on mainstream languages for which there is lots of open source code examples available.

I doubt they'd do so well on some obscure proprietary legacy language. For example, large chunks of the IBM i minicomputer operating system (formerly known as OS/400) are still written in two proprietary PL/I dialects, PL/MI and PL/MP. Both languages are proprietary – the compiler, the documentation, and the code bases are all IBM confidential, nobody outside of IBM is getting to see them (except just maybe under an NDA if you pay $$$$). I wonder how good an AI would go on that code base? I think it would have little hope unless IBM specifically fine-tuned an AI for those languages based on their internal documentation and code.

menaerus · 2024-12-15T11:24:36 1734261876

> unless IBM specifically fine-tuned an AI for those languages based on their internal documentation and code.

Why do you think this isn't already or won't be a case in the near future? Because that's exactly what I believe is going to happen given the current state and advancements of LLMs. There's certainly a large incentive from IBM to do so.

menaerus · 2024-12-15T11:18:02 1734261482

> code is a much smaller domain than the law.

Law of an average EU country fits in several hundred, let's say even thousands, of pages of text. Specification. Very well known. Low frequency of updates. But code? Everything opposite so I am not sure I could agree on this point at all.

petesergeant · 2024-12-15T14:01:39 1734271299

Right, but you're missing the point here that interpreting the law requires someone with a law degree, and all the real-world context that they have, and all the subtle knowledge about what things mean.

The Bible is also a short and well-known text, but if I want to answer religious questions for observant Christians, I can't just train it on that. You need a deep real world context to understand that "my buddy made SWE II and I'm only SWE I and it's eating me up" is about the biblical notion of covetousness.

menaerus · 2024-12-15T16:40:42 1734280842

And then I guess you're also missing the point that interpreting and writing the code also requires an expert and that in that respect it is no different than law. I could argue that engineering is more complex than interpreting law but that's not the point now. Subtleties and ambiguity are large in both domains. So, I don't see the point you're trying to make. We can agree to disagree I guess.

theptip · 2024-12-14T16:44:34 1734194674

For a “legal LLM” you need three things: general IQ / common sense at a substantially higher level than current, understanding of the specific rules, and hallucination-free recall of the relevant legal facts/cases.

I think it’s reasonable to assume you can get 2/3 with a small corpus IF you have an IQ 150 AGI. Empirically the current known method for increasing IQ is to make the model bigger.

Part of what you’re getting at is possible though, once you have the big model you can distill it down to a smaller number of parameters without losing much capability in your chosen narrow domain. So you forget physics and sports but remain good at law. That doesn’t help you with improving the capability frontier though.

pas · 2024-12-14T22:28:35 1734215315

And then your Juris R. Genius gets a new case about two Red Socks fans getting into a fight and without missing a beat starts blabbering about how overdosing on too much red pigments from the undergarments caused their rage!

theptip · 2024-12-15T00:36:57 1734223017

Yeah, I think for the highest-value activities (eg legal advice) you expect to run the full frontier model.

But maybe you want to run a smaller one locally on your iPhone for privacy and accept the capability loss.

yeahwhatever10 · 2024-12-14T17:08:00 1734196080

The problem remains the size of the dataset. You aren't going to get large enough datasets in these specific domains.

sharih · 2024-12-14T20:28:14 1734208094

the big frontier models already have all laws, regulations and cases memorized/trained on given they are public. the real advancement is in experts codifying their expertise/reasoning for models to learn from. legal is no different from other fields in this.

menaerus · 2024-12-15T12:27:39 1734265659

So, fine-tuning the model to the law of the exact country. Or fine-tuning the model to the problem space of the exact codebase/product. You hire 10 law experts instead of 100. Or you hire 10 programmers instead of 100. Expertise is becoming a commodity I'm afraid.

RicoElectrico · 2024-12-14T14:29:42 1734186582

I think we're not close to running out of training data. It's just that we would like knowledge, but not necessary behavior of said texts. LLMs are very bad at recalling popular memes (known by any seasoned netizen) if they had no press coverage. Maybe training with 4chan isn't as pointless if you could make it memorize it, but not imitate it.

Also, what about movie scripts and song lyrics? Transcripts of well known YouTube videos? Hell, television programs even.

gcollard- · 2024-12-14T21:00:18 1734210018

All the publicly accessible sources you mentioned have already been scraped or licensed to avoid legal issues. This is why it’s often said, “there’s no public data left to train on.”

For evidence of this, consider observing non-English-speaking young children (ages 2–6) using ChatGPT’s voice mode. The multimodal model frequently interprets a significant portion of their speech as “thank you for watching my video,” reflecting child-like patterns learned from YouTube videos.

stavros · 2024-12-14T17:41:12 1734198072

We've run out of training data that definitely did not contain LLM outputs.

DAGdug · 2024-12-14T19:51:23 1734205883

What about non-text modalities - image and video, specifically?

riffraff · 2024-12-14T21:51:15 1734213075

video is probably still fine, but images sourced from the internet now contain a massive amount of AI slop.

It seems, for example, that many newsletters, blogs etc resort to using AI-generated images to give some color to their writings (which is something I too intended to do, before realizing how annoyed I am by it)

YetAnotherNick · 2024-12-14T07:09:07 1734160147

Humans doesn't need trillions of tokens to reason or ability to know what they know. While a certain part of it comes from evolution, I think we have already matched the part that came from evolution using internet data, like basic language skills, basic world modelling. Current pretraining takes lot more data than a human would, and you don't need to look into all Getty images to draw a picture and so would a self aware/improving model(whatever that means).

To reach expert level in any field, just training next tokens for internet data or any data is not the solution.

exe34 · 2024-12-14T07:39:59 1734161999

> Humans doesn't need trillions of tokens

I wonder about that. we can fine tune on calculus with much fewer tokens, but I'd be interested in some calculations of how many tokens evolution provides us (it's not about the DNA itself, but all the other things that were explored and discarded and are now out of reach) - but also the sheer amount of physics learnt by a baby by crawling around and putting everything in its mouth.

YetAnotherNick · 2024-12-14T07:52:04 1734162724

Yes, as I said in the last comment. With current training techniques, one internet data is enough to give models what is given by evolution. For further training, I believe we would need different techniques to make the model self aware about its knowledge.

Also, I believe a person who is blind and paralyzed for life could still attain knowledge if educated well enough.(can't find any study here tbh)

exe34 · 2024-12-14T08:06:24 1734163584

yeah blind and paralysed from birth - I'm doubtful that hearing along would give you the physics training. although if it can be done, then it means the evolutionary pre-training is even more impressive.

bronco21016 · 2024-12-15T12:50:52 1734267052

> Humans doesn't need trillions of tokens to reason or ability to know what they know.

It seems to me by the time we’re 5-6 we’ve likely already been exposed to trillions of tokens. Just think of how many hours of video and audio tokens have already come to your brain by that point. We also have constant input from senses like touch and proprioception that help shape our understanding of the world around us.

I think there are plenty more tokens freely available out in the world. We just haven’t figured out how to have machines capture them yet.

robg · 2024-12-14T13:28:38 1734182918

The ones that stand out to me are industries like pharmaceuticals and energy exploration, where the data silos are the point of their (assumed) competitive advantages. Why even the playing by opening up those datasets when keeping them closed locks in potential discoveries? Open data is the basis of the Internet. But whole industries are based on keeping discoveries closely guarded for decades.

zozbot234 · 2024-12-14T15:15:49 1734189349

Synthetic datasets are useless (other than for very specific purposes, such as enforcing known strong priors, and even then it's way better to do it directly by changing the architecture). You're better off spending that compute by making multiple passes over the data you do have.

HeatrayEnjoyer · 2024-12-14T18:22:51 1734200571

This is contrary to what the big AI labs have found. Synthetic data is the new game in town.

kranke155 · 2024-12-14T18:36:01 1734201361

Ilya is saying it doesn’t work in this talk apparently.

seanhunter · 2024-12-15T07:16:30 1734246990

Ilya could be wrong. I don’t think the question is decided yet in general. We already know that in lots of fields fake data can be used in ways that are as useful as or even more useful than the real thing[1], but in my understanding that tends to be situations where we have an objective function that is unambiguous and known beforehand. Meta has some very impressive work on synthetic data for training and my (uninformed) read was that is the state of the art in eg voice recognition at the moment.[2]

[1] Eg Sobel sequences in a monte carlo simulation instead of real random numbers. They allow better coverage of the space of a simulation from fewer paths. https://www.sciencedirect.com/science/article/abs/pii/004155...

[2] Seems a good overview is https://arxiv.org/html/2404.07503

toxik · 2024-12-14T18:33:44 1734201224

Most priors are not encodable as architecture though, or only partially.

lo_fye · 2024-12-16T13:57:24 1734357444

I think this will be the one thing that causes Google to revive its plan to scan all books in existence. They had started it, and built the machines to do it, and were making good progress... until Copyright hit them. BUT if they're not making the full text publicly accessible, and are "only" training AI on it, who knows if that would still be a problem. It's definitely a vast treasure trove of information, often with citations, and (presumably) hyper-linkable sources.

parkaboy · 2024-12-14T13:56:47 1734184607

I wonder if we will see (or already are/have been seeing) the XR/smart glasses space heat up. Seems eventually like a great way to generate and hoover up massive amounts of fresh training data.

seydor · 2024-12-14T13:53:43 1734184423

Robots can acquire data on their own (hopefully not via human dissection)

rapjr9 · 2024-12-15T12:31:08 1734265868

This is indeed what I thought he was saying, AI needs to dynamically learn, just training on static data sets is no longer enough to advance. So continuous learning is the future, and the best source of data for continuous learning is people themselves. Don't know what form that might take, instrumenting lots of people with sensors? Robots interacting with people? Self driving cars learning from human drivers? (already happening) Ingesting all video from surveillance cameras? Whatever form the input data takes, continuous learning would be an advance in high level AI. There's been work on it over the decades, not sure how that work might relate to recent LLM work.

sureglymop · 2024-12-15T10:56:01 1734260161

Perhaps these models will be more personalized and there will be more personal data collection.

I am currently building a platform for heavy personal data collection including a keylogger, logging mouse positions and window focus, screenshots à la recall, open browser tabs and much more. My goal is to gather data now that may become useful later. It's mainly for personal use but I'd he surprised if e.g. iphones weren't headed in the same direction.

oldgradstudent · 2024-12-14T15:32:15 1734190335

> There are massive proprietary datasets out there which people avoid using for training due to copyright concerns.

The main legal concern is their unwillingness to pay to access these datasets.

zozbot234 · 2024-12-14T15:38:34 1734190714

Yup, there's also a huge amount of copyright-free, public domain content on the Internet which just has to be transcribed, and would provide plenty of valuable training to a LLM on all sorts of varied language use. (Then you could use RAG over some trusted set of data to provide the bare "facts" that the LLM is supposed to be talking about.) But guess what, writing down that content accurately from scans costs money (and no, existing OCR is nowhere near good enough), so the job is left to purely volunteer efforts.

vitorgrs · 2024-12-14T05:17:49 1734153469

Not sure if this was a good example. Getty already license their images to Nvidia.

And they already have a generative image service... I believe it's power by Nvidia model.

kibae · 2024-12-14T03:35:18 1734147318

I always suspected that bots on Reddit were used to gain karma and then eventually sell the account, but maybe they're also being used for some kind of RLHF.

numpad0 · 2024-12-14T17:23:54 1734197034

> What will replace Internet data for training?

It means unlimited scaling with Transformer LLM is over. They need a new architecture that scales better. Internet data respawns when they click [New Game...], oil analogy is an analogy and not a fact, but anyways total amount available in a single game is finite so combustion efficiency matters.

_aavaa_ · 2024-12-14T03:14:32 1734146072

> just as oil is a finite resource, the internet contains a finite amount of human-generated content.

I guess now they’re being explicit about the blatantly extractive nature of these businesses and their models.

popularonion · 2024-12-14T06:49:15 1734158955

> What will replace Internet data for training? Curated synthetic datasets?

Enter Neuralink

zxexz · 2024-12-14T07:53:56 1734162836

Really not sure what you mean by this, could you explain?

Gigachad · 2024-12-14T08:13:48 1734164028

AI can just suck up the content of peoples brains for training data.

phillipcarter · 2024-12-14T14:52:06 1734187926

You need to go back to Twitter with low-quality posts like this.

zxexz · 2024-12-14T08:19:23 1734164363

Yeah, people will go crazy for GPT-o2 trained on the readings of sensors "barely embedded" in the brains tortured monkeys, for sure.

EDIT: This comment may have been a bit too sassy. I get the thought behind the original comment, but I personally question the direction and premise of the Neuralink project, and know I am not alone in that regard. That being said, taking a step back, there for sure are plenty of rich data sources for non-text multimodal data.

legel · 2024-12-14T13:33:56 1734183236

I’m glad Ilya starts the talk with a photo of Quoc Le, who was the lead author of a 2012 paper on scaling neural nets that inspired me to go into deep learning at the time.

His comments are relatively humble and based on public prior work, but it’s clear he’s working on big things today and also has a big imagination.

I’ll also just say that at this point “the cat is out of the bag”, and probably it will be a new generation of leaders — let us all hope they are as humanitarian — who drive the future of AI.

mrbungie · 2024-12-14T19:10:06 1734203406

Let us all hope that they will be as humanitarian as they can be, but let's not forget they are still just human beings.

chipsrafferty · 2024-12-14T17:47:30 1734198450

Literally a zero chance that the new generation of leaders of artifical intelligence will be humanitarian.

killthebuddha · 2024-12-14T17:00:17 1734195617

One thing he said I think was a profound understatement, and that's that "more reasoning is more unpredictable". I think we should be thinking about reasoning as in some sense exactly the same thing as unpredictability. Or, more specifically, useful reasoning is by definition unpredictable. This framing is important when it comes to, e.g., alignment.

mike_hearn · 2024-12-14T20:14:54 1734207294

Wouldn't it be the reverse? The word unreasonable is often used as a synonym for volatile, unpredictable, even dangerous. That's because "reason" is viewed as highly predictable. Two people who rationally reason from the same set of known facts would be expected to arrive at similar conclusions.

I think what Ilya is trying to get at here is more like: someone very smart can seem "unpredictable" to someone who is not smart, because the latter can't easily reason at the same speed or quality as the former. It's not that reason itself is unpredictable, it's that if you can reason quickly enough you might reach conclusions nobody saw coming in advance, even if they make sense.

killthebuddha · 2024-12-15T17:43:40 1734284620

Your second paragraph is basically what I'm saying but with the extension that we only actually care about reasoning when we're in these kinds of asymmetric situations. But the asymmetry isn't about the other reasoner, it's about the problem. By definition we only have to reason through something if we can't predict (don't know) the answer.

I think it's important for us to all understand that if we build a machine to do valuable reasoning, we cannot know a priori what it will tell us or what it will do.

bflesch · 2024-12-14T23:32:20 1734219140

they only arrive at the same conclusion if they both have the same goal.

one could be about maximising wealth while respecting other human beings, the other could be about maximising wealth without respecting other human beings.

Both could be presented same facts and 100% logical but arrive at different conclusions.

liuliu · 2024-12-15T02:32:08 1734229928

I think many of replies here to you missing is the word he uses is "unpredictable". It is not "surprising", "unverifiable" or "unreasonable".

"Prediction" associated in this particular talk is about "intuition": what human can do in 0.1 second. And a most powerful reasoning model by its definition will arrive at "unintuitive" answer because if it is intuitive, it will arrive at the same answer much sooner without long chain of "reasoning". (I also want to make distinction "reasoning" here is not the same as "proof" in mathematical sense. In mathematics, an intuitive conclusion can require extrodinary proof.)

billyzs · 2024-12-16T01:20:07 1734312007

To me the chess AI example he used was perhaps not the most apt. Human players may not be able to reason on as far a horizon as AI and therefore find some of AI's moves perplexing, but they can be more or less sure that a Chess AI is optimizing for the same goal under the same set of rules with them. With Reasoners, alignment is not given. They may be reasoning under an entirely different set of rules and cost functions. On more open ended questions, when Reasoners produce something that human don't understand, we can't easily say whether it's a stroke of genius, or misaligned thoughts.

bondarchuk · 2024-12-14T17:17:51 1734196671

Not necessarily true when you think about e.g. finding vs. verifying a solution (in terms of time complexity).

killthebuddha · 2024-12-14T17:27:13 1734197233

IMO verifying a solution is a great example of how reasoning is unpredictable. To say "I need to verify this solution" is to say "I do not know whether the solution is correct or not" or "I cannot predict whether the solution is correct or not without reasoning about it first".

bondarchuk · 2024-12-14T18:50:20 1734202220

But you will know beforehand some/a lot of properties that the solution will satisfy, which is a type of certainty.

stevenhuang · 2024-12-14T19:54:18 1734206058

It's not clear any of that follows at all.

Just look at inductive reasoning. Each step builds from a previous step using established facts and basic heuristics to reach a conclusion.

Such a mechanistic process allows for a great deal of "predictability" at each step or estimating likelihood that a solution is overall correct.

In fact I'd go further and posit that perfect reasoning is 100% deterministic and systematic, and instead it's creativity that is unpredictable.

killthebuddha · 2024-12-15T17:30:03 1734283803

Perfect reasoning, with certain assumptions, is perfectly deterministic, but that does not at all imply that it's predictable. In fact we have extremely strong evidence to the contrary (e.g. we have the halting problem).

stevenhuang · 2024-12-16T07:56:57 1734335817

This sounds confused. Why do you think the halting problem is relevant to predictability? Undecidable problems != Unpredictable problems

bmitc · 2024-12-15T03:47:21 1734234441

Are you sure that's what he was referring to? In other words, you don't think he was meaning that getting more reasoning out of models is an unpredictable process and not saying that reasoning is unpredictable.

narrator · 2024-12-14T17:51:43 1734198703

Reasoning by analogy is more predictable because it is by definition more derivative of existing ideas. Reasoning from first principles though can create whole new intellectual worlds by replacing the underpinnings of ideas such that they grow in completely new directions.

sigmar · 2024-12-14T13:15:42 1734182142

I found this week's DeepMind podcast with Oriole Vinyals to be on similar topics as this talk (current situation of LLMs, path ahead with training) but much more interesting: https://pca.st/episode/0f68afd5-2b2b-4ce9-964f-38193b7e8dd3

sensanaty · 2024-12-14T15:46:09 1734191169

> just as oil is a finite resource, the internet contains a finite amount of human-generated content.

The oil comparison is really apt. Indeed, let's boil a few more lakes dry so that Mr Worldcoin and his ilk can get another 3 cents added to their net worth, totally worth it.

seizethecheese · 2024-12-14T21:59:40 1734213580

I understand the oil analogy, but not your leap. What lake is getting boiled?

lennxa · 2024-12-15T03:53:22 1734234802

they are referring to the "water footprint" of LLMs. https://deepgram.com/learn/how-ai-consumes-water

olddog2 · 2024-12-14T16:12:39 1734192759

So much knowledge in the world is locked away with empiric experimentation being the only way to unlock it, and compute can only really help that experimentation become more efficient. Something still has to run a randomized controlled trial on an intervention and that takes real time and real atoms to do.

belter · 2024-12-14T15:10:35 1734189035

It’s surprising that some prominent ML practitioners still liken transformer ‘neurons’ to actual biological neurons...

Real neurons rely on spiking, ion gradients, complex dendritic trees, and synaptic plasticity governed by intricate biochemical processes. None of which apply to the simple, differentiable linear layers and pointwise nonlinearities in transformers.

Are there any reputable neuroscientists or biologists endorsing such comparisons, or is this analogy strictly a convention maintained by the ML community? :-)

martindbp · 2024-12-14T15:29:47 1734190187

You have to remember what came before 2012: SVMs, Random Forests etc, absolutely nothing like the brain (yes, NNs are old, but 2012 was the start of the deep learning revolution). With this frame of reference, the brain and neural networks are both a kind of Connectionism with similar properties, and I think it makes perfect sense to liken them with each other, draw inspiration from one and apply it to the other.

belter · 2024-12-14T16:11:34 1734192694

It is also odd to see such a weak argument as the brain-to-body mass ratio being used, as here: https://youtu.be/YD-9NG1Ke5Y?t=593

If this metric were truly indicative, what should we make of the remarkable ratios found in small birds (1:12), tree shrews (1:10), or even small ants (1:7)?

https://en.wikipedia.org/wiki/Brain%E2%80%93body_mass_ratio

theptip · 2024-12-14T16:30:51 1734193851

> what should we make of the remarkable ratios found…

We also can’t implement those creatures’ control systems in silicon, so they too are doing things we can learn from?

zk4x · 2024-12-14T19:11:39 1734203499

What came before was regression. Which is to this day no 1 method if we want something interpretable, especially if we know which functions our variables follow. And self attention is very similar to correlation matrix. In a way neural networks are just bunch of regression models stacked on top of each other with some normalization and nonlinearity between them. It's cool however how closely it resembles biology.

signa11 · 2024-12-14T15:38:19 1734190699

sorry, but i think neural-networks came way before 2012, notably the works of rumelhart, mccleland etc. see the 2 volume "parallel distributed processing" to read almost all about it.

the book(s): https://direct.mit.edu/books/monograph/4424/Parallel-Distrib...

a-talk: https://www.youtube.com/watch?v=yQbJNEhgYUw

martindbp · 2024-12-14T19:39:38 1734205178

I knew someone would bring it up, which is why I added "(yes, NNs are old, but 2012 was the start of the deep learning revolution)"

versteegen · 2024-12-14T22:50:24 1734216624

2012 was when the revolutionaries stormed the bastille and overthrew the old guard. But I say it was 2006 when the revolution started, when the manifesto was published: deep NNs can be trained end-to-end, learning their own features [1]. I think this is when "Deep Learning" became a term of art, and the paper has 24k citations. (Interestingly in a talk a Vector Hinton gave two weeks ago he said his paper on deep learning at NIPS 2006 was rejected because they already had one.)

[1] G. E. Hinton and R. R. Salakhutdinov, 2006, Science, Reducing the Dimensionality of Data with Neural Networks

mcshicks · 2024-12-14T16:10:54 1734192654

Jets and Sharks!

https://github.com/acmiceli/IACModel

FL33TW00D · 2024-12-14T17:06:30 1734195990

I raise you Warren McCulloch in 1962: https://www.youtube.com/watch?v=wawMjJUCMVw

bflesch · 2024-12-14T23:39:50 1734219590

Sorry, but they just called it "neuron" to sound nicer.

zitterbewegung · 2024-12-14T15:43:47 1734191027

Neural Networks are 200 years old (Legendre and Gauss defined Feed forward neural networks). Deep learning. The real difference between traditional ones and deep learning is a hierarchy of layers (hidden layers) which do different things to accomplish a goal. Even the concept of training is to provide weights on the neural network and there are many algorithms to do refinement, optimization and the network design.

varjag · 2024-12-14T16:48:18 1734194898

Gauss did not define feed forward neural networks, it all stems from a tweet of a very confused person.

mrbungie · 2024-12-14T19:01:54 1734202914

I mean, sure, you can model a simple linear regression fitted via Least Squares (pretty much what they did 200 years ago) with a one hidden layer feed-fwd Neural Network, but the theorical framework for NNs is quite different.

hervature · 2024-12-15T03:47:26 1734234446

For Least Squares, you do not even use a hidden layer. Just a single dense layer from input directly to output. You also do not use an activation function (or use the identity activation function). That is, everything that makes neural networks special.

chpatrick · 2024-12-14T16:29:07 1734193747

You don't need to simulate every atom in a planet to predict its orbit. A mathematical neuron could have similar function to a real one even if it works completely differently.

criddell · 2024-12-14T16:09:21 1734192561

Is that wildly different from me calling a data structure where a parent node has child nodes a tree?

wrs · 2024-12-14T18:21:00 1734200460

Depends — do you then start claiming that because your data structure is like a tree, it’s surely going to start bearing fruit and emitting oxygen?

sourcepluck · 2024-12-14T16:45:06 1734194706

Reading the replies to your comment, I think maybe the answer to your simple question is: "no". I also wonder if any "serious comparisons" have been made, and would be interested to read about it! A good question, I think.

curious_cat_163 · 2024-12-14T19:09:00 1734203340

Not excusing the lack of caveat in his talk but IMO, the old adage of: "All models are wrong, but some are useful." applies here.

syassami · 2024-12-14T17:21:02 1734196862

https://www.bloomberg.com/news/articles/2024-12-13/liquid-ai...

modzu · 2024-12-14T16:19:18 1734193158

what color are neurons? is that relevant? ml has proven that artificial networks can think. the other stuff may be necessary to do other things, or maybe simply evolved to support the requisite biological structures. ml is of course inspired by biology, but that does not mean we need to simulate everything.

bflesch · 2024-12-14T23:41:16 1734219676

> ml has proven that artificial networks can think

nope, at this point it's an ad-free version of google that summarizes results without linking to source websites.

neom · 2024-12-14T04:26:24 1734150384

Full talk is interesting: https://www.youtube.com/watch?v=YD-9NG1Ke5Y

CuriousSkeptic · 2024-12-14T10:54:56 1734173696

On the slide of body/brain weight relation he highlighted the humanids difference in scaling

What he didn't mention, that I found interesting, was that the same slide also highlighted a hard ceiling for non-humanids at the same point

imranhou · 2024-12-14T16:20:04 1734193204

This is a very interesting point, in some ways the implicit belief is that we just need to get beyond the 700g limitation in terms of scaling LLM models and we would get human intelligence/superintelligence. I admit I didn't really get the body/brain analogy, I would have been better satisfied with a simpler graph of brain weight to intelligence with a scaling barrier of 700g.

ldenoue · 2024-12-14T15:00:15 1734188415

LLM corrected transcript (using Gemini Flash 8B over the raw YouTube transcript) https://www.appblit.com/scribe?v=YD-9NG1Ke5Y#0

oezi · 2024-12-14T16:09:57 1734192597

How do you prevent Gemini from just swallowing text after some time?

Audio transcript correction is one area where I struggle to see good results from any LLM unless I chunk it to no more than one or two pages.

Or did you use any tool?

niyyou · 2024-12-14T21:18:02 1734211082

I’ll take the risk of hurting the groupies here. But I have a genuine question: what did you learn from this talk? Like… really… what was new? or potentially useful? or insightful perhaps? I really don’t want to sound bad-mouthed but I‘m sick of these prophetic talks (in this case, the tone was literally prophetic—with sudden high and grandiose pitches—and the content typically religious, full of beliefs and empty statements.

niyyou · 2024-12-14T21:22:34 1734211354

Precision: « pre-training data is exhausted » everyone has been saying that for a while now. The graph plotting body mass against brain mass… what does it say exactly? (where is the link to the prior point on data?). I think we would all benefit from being more critical here and stop idealizing these figures. I believe they have no more clue that any other average ML researcher on all these questions.

XenophileJKO · 2024-12-14T21:34:06 1734212046

The other thing that bugged me is the built in assumption that today's model have learned everything there is to learn from the Internet corpus. This is quite easy to disprove. Both in factual retention, but also meta cognition on the context of the content.

jebarker · 2024-12-14T21:54:06 1734213246

Yeah, exactly. A human can learning vastly more about, say, math from a much smaller quantity of text. I doubt we're anywhere close to exhausting the knowledge extraction potential from web data.

bbor · 2024-12-15T00:49:14 1734223754

Which is exactly the point he's making, I believe; that simply collecting more data isn't the next step. That we've reached a local plateau in scaling ability based on corpus size. Which was assumed by pretty much everyone outside the DL elite the whole time, AFAIU

esperent · 2024-12-15T03:09:03 1734232143

Right, but there's nothing new in that statement. I've been hearing that we're running out of data for training AIs for two years at least.

cma · 2024-12-15T02:45:01 1734230701

Also left out the Baidu scaling laws paper from 2017, and his circle has a history of a kind of citation ring type thing leaving earlier stuff out

https://research.baidu.com/Blog/index-view?id=89

sashank_1509 · 2024-12-15T01:02:55 1734224575

Reminds me a little of a Feynman quote. Once physicists win a Nobel prize, their output falls because now they no longer can work on small problems. Everything they work on must be grand. Every speech they give must discover secrets of the universe. Seems to fit Ilya.

29athrowaway · 2024-12-14T21:58:57 1734213537

From your reaction I guess you were expecting a talk about a NeurIPS 2024 paper.

This is a different situation. There's the "NeurIPS 2024 Test of Time Paper Awards" where they award a historical paper. In this case, a paper from 2014 was awarded and his talk is about that and why it passed the test of time.

https://blog.neurips.cc/2024/11/27/announcing-the-neurips-20...

The title chosen for the HN submission leaves out that important context. So that's why you are disappointed now.

p1esk · 2024-12-14T23:27:14 1734218834

I am also disappointed, and I have not missed the context. The talk is empty for anyone who follows the field for more than two years, and especially for those who are familiar with his 2014 paper. Yes, he had amazing insight and intuition behind modern LLM breakthroughs, and yes, he probably earned the right to sound "prophetic", but he could have provided some interesting personal anecdotes about how the paper was written, or some fresh ideas in "What Comes Next" section of his presentation.

29athrowaway · 2024-12-15T00:31:40 1734222700

True. The entire thing was basically "neurons go brrrr".

katamari-damacy · 2024-12-14T23:38:49 1734219529

He also said in an interview with Jensen soon after ChatGPT's launch that "before 2003, machines couldn't learn" ... LOL. I was stunned when I heard that nonsensical assertion. I guess it depends on his definition of "learn" ...

bbor · 2024-12-15T00:51:44 1734223904

"learn" is usually used in opposition to "taught", which refers to "expert systems"-type engineering; in other words, providing data and a success heuristic and asking it to devise its own optimal strategies vs. providing strategies hand-designed by humans.

Obviously Perceptrons came out well before 2003, but I don't think it's necessarily out of line to say that they had limited efficacy before then, both for theoretical and compute reasons. But maybe I'm misunderstanding your criticism?

remexre · 2024-12-15T02:04:19 1734228259

ILP goes back to the 80s, and was used to do drug discovery in the 90s. Bayes nets go back to the 80s as well.

YeGoblynQueenne · 2024-12-15T02:35:14 1734230114

"ILP" (as in Inductive Logic Programming not Integer Linear Programming) was first named in 1991 in a paper by Stephen Muggleton ("Inductive Logic Programming and Progol). The paper properly launched the field and generated a great deal of excitement at the time.

There were precursors. At least Ehud Shapiro's doctoral thesis ("Automated Debugging") in the 1980's and Gordon Plotkin's doctoral thesis in the 1970's ("Automated Methods of Inductive Inference"). Sorry for not giving the exact years off the top of my head but I think it was 1983 and 1976, respectively.

The point you are making is very right however because modern machine learning as a field started in the 1980's with the fall of expert systems, in fact it basically started as an effort to overcome one of the major limitations of expert systems, the so-called "knowledge acquisition bottleneck", which is to say, the difficulty of creating and maintaining huge databases of expert knowledge (in the form of production rules).

In any case the seminal textbook in the field for the first 20 years, Tom Mitchell's Machine Learning came out in 1997 (https://www.cse.iitb.ac.in/~cs725/notes/slides/tom_mitchell/...) and includes probabilistic, neural-net based and symbolic, logic-based approaches. So not only machines could "learn" way before 2003 but they could also learn in many different ways than what Ilya Sutskever means.

We can go further back, to Donald Michie's 1961 MENACE (the first Reinforcement Learning system, implemented on a computer made of matchboxes with coloured beads used to encode state) and Arthur Samuel's 1959 checkers player (a paper on which gave the name to the field of machine learning).

Lots of learning all over the place long, long before 2003.

katamari-damacy · 2024-12-15T12:03:25 1734264205

Yes, so what does it say about our rigged system for a guy who neither invented the attention mechanism nor the concept of GPTs to be given that much credit for the current wave of AI? One lucky choice (betting on scale) backed by $100Ms of other people's money does not entitle one to genius-hood.

bbor · 2024-12-15T19:53:08 1734292388

I mean, tbf, he hasn't won a Turing yet, right? So the academy hasn't fully embraced him as a genius. VCs/SV are more fickle, but even they aren't necessarily in love -- his Superintelligence startup raised a modest but far from unusual amount of cash, AFAIR

random3 · 2024-12-14T21:34:18 1734212058

What everyone could learn is to check their (and their communities') assumptions from not long ago. Who saw this, who didn't. Based on this many can confirm their beliefs and others can realize they're clueless. In either case, there's something to be learned but more to be learned when you realize you were wrong.

fullstackwife · 2024-12-15T00:40:31 1734223231

Today I searched for early discussions about Transformer here on HN, and my observation is that back in 2019 nobody in HN comments predicted what is going to happen. It was a niche topic most of the commenters ignored, no strong opinions.

Probably what we are discussing here is not the next breakthrough...

eldenring · 2024-12-14T22:22:07 1734214927

He mentions this in the video, but the talk is specifically tailored for the "Test of Time" award. This being his 3rd year in a row recieving the award, I think he's earned permission to speak prophetically.

tylerchilds · 2024-12-14T23:12:29 1734217949

did the last two prophecies come true?

eli_gottlieb · 2024-12-15T02:09:55 1734228595

It's a test-of-time talk. The point was to let him have a moment to brag and celebrate about the success of GANs, 10 years later.

wills_forward · 2024-12-15T00:51:13 1734223873

It was funny to hear the same guy warning LMMs were getting too powerful now talking about the limits of available original training data.

abetusk · 2024-12-14T22:01:00 1734213660

I'll give my take:

* Before the current renaissance of neural networks (pre ~2014ish), it was unclear that scaling would work. That is, simple algorithms on lots of data. The last decade has pretty much addressed that critique and it's clear that scaling does work to a large extent, and spectacularly so.

* Much of the current neural network models and research are geared towards "one-shot" algorithms, doing pattern matching and giving an immediate result. Contrast this with search which needs to do inference time compute or search.

* The exponential increase in power means that neural network models are quickly sponging up as much data as they can find and we're quickly running into the limits of science, art and other data that humans have created in the last 5k years or so.

* Sutskever points out, as an analogy, nature has created a better model for humans (the brain to mass ratio for animals) with hominids finding more efficient compute than other animals, even ones with much larger brains and neuron count.

* Sutskever is advocating for better models, presumably focusing on inference time computer more.

In some sense, we're coming a bit full circle where people who were advocating for pure scaling (simple algorithms + lots of data) for learning are now advocating for better algorithms, presumably with a focus on inference time compute (read: search).

I agree that it's a little opaque, especially for people who haven't been paying attention to past and current research, but this message seems pretty clear to me.

Noam Brown had a talk recently titled "Parables on the Power of Planning in AI" [0] which addresses this point more head on.

I will also point out that the scaling hypothesis is closely related to "The Bitter Lesson" by Rich Sutton [1]. Most people focus on the "learning" aspect of scaling but "The Bitter Lesson" very clearly articulates learning and search as the methods most amenable to compute. From Sutton:

"""

...

Search and learning are the two most important classes of techniques for utilizing massive amounts of computation in AI research.

...

"""

[0] https://youtube.com/watch?v=eaAonE58sLU

[1] https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson...

abetusk · 2024-12-14T22:37:13 1734215833

Here's a more pithy summary:

"We've made a copy of the internet, run current state of the art methods on it and GPT-O1 is the best we can do. We need better (inference/search) algorithms to make progress"

stretchwithme · 2024-12-14T07:08:13 1734160093

AIs will need to start asking people questions. Should make for some very strange phone calls.

wslh · 2024-12-14T13:14:12 1734182052

That's a good point. I think most people use LLMs by asking questions and receiving answers. But if you reverse the dynamic and have the LLM interview you instead, where you simply respond to its questions, you'll notice something interesting: the LLM as an interviewer is far less "smart" than it is when simply providing answers. I’ve tried it myself, and the interview felt more like interacting with ELIZA [1].

There seemed to be a lack of intent when the LLM was the one asking the questions. This creates a reverse dynamic, where you become the one being "prompted" and this dynamic could be worth studying or adjusting further

[1] https://en.wikipedia.org/wiki/ELIZA

Barrin92 · 2024-12-14T18:56:46 1734202606

>There seemed to be a lack of intent when the LLM was the one asking the questions

There doesn't just seem to be lack of intent, there is no intent, because by the nature of its architecture these systems are just set of weights with a python script attached to them asking you to give you one more token over and over.

There's no needs, drives, motivations, desires or any other part of the cognitive architecture of humans in there that produce genuine intent.

staticman2 · 2024-12-15T00:51:59 1734223919

Did you ask the AI to roleplay? All it does is predict what text comes next. Telling it it is role-playing as a perceptive journalist or prominent psychologist should change its predictions.

In some cases using it where you have API access to the system prompt will allow a greater difference in behavior.

airstrike · 2024-12-14T14:52:04 1734187924

Which LLM did you perform that test with?

wslh · 2024-12-14T17:04:29 1734195869

ChatGPT Pro.