I can't help but feel that this talk was a lot of...fluff?
The synopsis, as far as my tired brain can remember:
- Here's a brief summary of the last 10 years
- We're reaching the limit of our scaling laws, because we've trained on all the data we have available on the limit
- Some things that may be next are "agents", "synthetic data", and improving compute
- Some "ANNs are like biological NNs" rehash that would feel questionable if there was a thesis (which there wasn't? something about how body mass vs. brain mass are positively correlated?)
- 3 questions, the first was something about "hallucinations" and whether a model be able to understand if it is hallucinating? Then something that involved cryptocurrencies, and then a _slightly_ interesting question about multi-hop reasoning
I attended this talk in person and some context is needed. He was invited for the “test of time” talk series. This explains the historical part of the talk. I think his general persona and association with ai led to the fluffy speculation at the end.
I notice with Ilya he wants to talk about these out there speculative topics but defends himself with statements like “I’m not saying when or how just that it will happen” which makes his arguments impossible to address. Stuff like this openly invites the crazies to to interact with him, as seen with the cryptocurrency question at the end.
Right before this was a talk reviewing the impact of GANs that stayed on topic for the conference session throughout.
I mean he repeateadly gave some hints (even if just for the lulz and not seriously) that the audience is at least partially composed of people with little technical background or AI bros. An example is when he mentioned LSTMs and said "many of you may never seen before". Even if he didn't mean it, ironically it ended being spot on when the crypto question came.
As someone who is at NeurIPS right now with a main conference paper, I was shocked at how many NeurIPS attendees had no paper. At ACL conferences, almost every person attending has a paper (even if it's only at a workshop)
NeurIPS is "ruined" by the money and thus attracts huge amounts of people who are all trying to get rich. It's a bloody academic conference people!
A person with little or no technical background, that neither knows nor cares about AI (or other scientific/mathematical advancements) other than their potential to make the AI bro rich. There is a big overlap with crypto bros, and in fact many AI bros are just grifters who moved on after crypto tanked with the recent fed funds rate hikes.
Well, it looks like the entire point was "you can no longer expect a capability gain from a model with a bigger ndim trained on a bigger internet dump".
That's just one sentence, but it's pretty important. And while many people already know this, it's important to hear Sutskever say this. So people know it's a common knowledge.
But they are self-aware, in fact it's impossible to make a good AI assistant which isn't: it has to know that it's an AI assistant, it has to be aware of its capabilities, limitations, etc.
I guess you're interpreting "self-awareness" in some mythical way, like a soul. But in a trivial sense, they are. Perhaps not to same extent as humans: models do not experience time in a continuous way. But given that it can maintain a dialogue (voice mode, etc), it seems to be phenomenologically equivalent.
Would you say that a calculator is also self-aware, and that it must know its limitations so that it doesn't attempt to do calculations it isn't capable of doing, for instance?
Alright. Suppose "meaning" (or "understanding") is something which exists in human head.
It might seem as a complete black box, but we can get some information about it by observing human interactions.
E.g. suppose Алиса does not know English, but she has a bunch of cards with instructions "Turn left", "Bring me an apple", etc. If she shows these cards to Bob and Bob wants to help her, Bob can carry out instructions in a card. If they play this game, the meaning which card induces in Bob's head will be understood by Алиса, thus she will be able to map these cards to meaning in her head.
So there's a way to map meaning which is mediated by language.
Now from math perspective, if we are able to estimate semantic similarity between utterances we might be able to embed them into a latent "semantic" space.
If you accept that the process of LLM training captures some aspects of meaning of the language, you can also see how it leads to some degree of self-awareness. If you believe that meaning cannot be modeled with math then there's no way anyone can convince you.
How does math encode meaning if there is no Alice and Bob? You should quickly realize the absurdity of your argument once you take people out of the equation.
Not sure what you mean... A NN training process can extract semantics from observations. That semantics can be then subsequently applied e.g. to robots. So it doesn't depend on humans beyond production of observations.
The function/mathematics in an NN (neural network) is meaningless unless there is an outside observer to attribute meaning to it. There is no such thing as a meaningful mathematical expression without a conscious observer to give it meaning. Fundamentally there is no objective difference between one instance of a NN with one parameter, f(θ), evaluated on some input, f(θ)(x), and another instance of the same network with a small perturbation of the parameter, f(θ+ε), evaluated on the same input, f(θ+ε)(x), unless a conscious observer perceives the output and attributes meaning to the differences because the arithmetic operations performed by the network are the same in both networks in terms of their objective complexity and energy utilization.
How does the universe encode meaning if there is no Alice and Bob?
One common answer is: it doesn't.
And yet, here we are, creating meaning for ourselves despite being a state of the quantum wave functions for the relevant fermion and boson fields, evolving over time according to a mathematical equation.
(Philosophical question: if the time-evolution of the wave functions couldn't be described by some mathematical equation, what would that imply?)
The universe does not have a finite symbolic description. Whatever meaning you attribute to the symbols has no objective reality beyond how people interpret those symbols. Same is true for the arithmetic performed by neural networks to flash lights on the screen which people interpret as meaningful messages.
> The universe does not have a finite symbolic description
Why do you believe that? Have you mixed up the universe with Gödel's incompleteness theorems?
Your past light cone is finite in current standard models of cosmology, and according to the best available models of quantum mechanics a finite light cone has a finite representation — in a quantised sense, even, with a maximum number of bits, not just a finite number of real-valued dimensions.
Even if the universe outside your past light cone is infinite, that's unobservable.
> Same is true for the arithmetic performed by neural networks to flash lights on the screen which people interpret as meaningful messages.
This statement is fully compatible with the proposition that an artificial neural network itself is capable of attributing meaning in the same way as a biological neural network.
It does not say anything, one way or the other, about what is needed to make a difference between what can and cannot have (or give) meaning.
I don’t understand what you consider self-awareness to be in this trivial sense. Is Eliza self-aware for example? Eliza maintains a dialogue albeit not obviously as coherent as a modern LLM.
From my perspective, we know what the benchmark is for our own self-awareness, because Decartes gave it to us: Cogito ergo sum. I think therefore I am. We know we think, so we know we exist. That is the root of our self-awareness.
There is a great deal of controversy about the question of whether any of the existing models think at all (and of course that’s the whole point of Turing’s amazing paper[1], and the Chinese room thought experiment[2]) and the best you could say is the burden at the moment is on the people who say models can think to prove that.
Given that, I really don’t see how you can say models have self-awareness at the moment. Models may hypothetically be able to convince themselves they are self-aware via Decartes’ method but notice that Decartes’ proof doesn’t work for us - he was able to pull himself up by his own bootstraps because he knew there was a thought so there must be an “I” who was doing the thinking. We have to observe models from the outside and determine whether or not thought is present, and that’s where the concept behind the Chinese room shows how tricky this is.
The only kind of intelligence we know of since childhood is meat-based intelligence. Brain made out of flesh.
So our intuition tells us that flesh is essential. "Chinese room" appeals to this intuition.
But it's a circular argument.
Anyway, I believe they are self-aware, to some extent. Not like a human. But I have no arsenal to convince people who believe intelligence has to be made out of meat.
I certainly don’t think intelligence has to be made out of meat but I do think it’s a complex set of questions and we owe it to ourselves to actually try to tackle the underlying issue of what intelligence and self-awareness etc really are and how we might know whether a model did or did not exhibit these, because we are apt to fool ourselves on both sides of this argument depending on our prior beliefs.
You can see the same thing when people talk about whether animals exhibit self-awareness. There are experiments with dolphins and mirrors for example that definitely suggest that dolphins recognise and might even be amused by their reflection when they see it, but some people find it very hard to reconcile themselves to the idea tgat a dolphin might have a sense of self. I personally find it harder to believe that any particular characteristic would be uniquely human.
Self-awareness is a huge rabbit hole to go down. Its one if the many concepts we think make humans unique, at least in degree, but we never really found a clear way to define or identify it.
I have watched dogs, cats, cows, and chickens pretty extensively. I still couldn't tell you if they are really self-aware, and it ultimately it comes down to a definitional challenge of not having a clear line to draw and identify.
What makes you say LLMs are already self-aware, and how do you define it? And as long as an LLM is functionally a black box, how do you know it comprehends the idea that it is an LLM rather than having simply been trained on that token pattern or given that context as an instruction?
Is it possible to tell intelligence from a simple lookup-table based script by asking questions? I think so.
If you ask one question, there's a chance it was in a lookup table.
If you ask multiple questions from an immense set of questions (like trillions of trillions of trillions..., sampled uniformly), and it answers all correctly, then it's either true intelligence or a lookup table which covers this whole immense space. (I'd argue there's no difference, as a process which makes this nearly-infinite table has to be intelligent.)
Same with self awareness - you can ask questions where it applies...
In either case you are only looking at inputs and outputs. The results may correlate with what you'd consider self-awareness or intelligence, but can you really say that's what you are set ng without understanding how the system works internally.
LLMs are trained on a massive dataset and the resulting model is effectively a compressed representation of that dataset. On the surface you'd have no way of knowing whether the algorithm answering is in fact just a lookup table.
This issue feels very similar to scientific modelling vs controlled studies. Modelling may show correlation, but it will never be able to show causation. Asking a system a bunch of questions and getting the right answers is the same, you're just coming up with a sample set of modelling data and attempting to interpret how the system likely worked only by looking at inputs and outputs.
When I talk with a person I can tell that they are intelligent and self-aware.
Claiming that self-awareness is unknowable concept is inherently unproductive.
People have been using "theory of mind" in practice for millenia so we have to assume it's good for something, otherwise we won't go anywhere. I don't think that knowing internals is important - I don't reach for a scalpel to get what a person means.
When you're talking to a person, though, you also have an understanding of what a human is and what your own experience is.
Its reasonable to interact with another human and expect that they are roughly similar to you, especially when your interactions match what you'd expect.
That doesn't extend as well to other species, let along non-living things that are entirely different from us. They could seem intelligent from the outside but internally function like a lookup table. They also could externally seem like a lookup table while internally matching much better what wed consider intelligence. We don't have context of first hand experience that applies and we don't know what's going on inside the black box.
With all that said, I'm phrasing this way more certain than I mean to. I wouldn't claim to know whether a box is intelligent or not, I'm just trying to point out how hard or impossible it would be today without knowing more about the box.
> Its reasonable to interact with another human and expect that they are roughly similar to you, especially when your interactions match what you'd expect.
It is a default belief that most of us have. The more I learn, the less I think it is true.
Some people have no autobiographical memory, some are aphantasic, others are autistic; motivations can be based on community or individualism, power-seeking, achievements, etc.; some are trapped by fawning into saying yes to things when they want to say no; some are sadistic or masochistic; myself I am unusual for many reasons, including having zero interest in spectator sports and that I will choose to listen to music only rarely.
I have no idea if any AI today (including but not limited to LLMs) are conscious by most of the 40 different meanings of that word, but I do suspect that LLMs are self-aware because when you get two of them talking to each other, they act as if they know they're talking to another thing like themselves.
But that's only "I suspect", not even "I believe" or "I'm confident that", because I am absolutely certain that LLMs are fantastic mimics and thus I may only be seeing a cargo-cult version of self-awareness, a Clever Hans version, something that has the outward appearance but no depth.
> It is a default belief that most of us have. The more I learn, the less I think it is true.
Sure, that's totally reasonable! It all depends on context - I think I'm safe to assume another human is more similar to me than an ant, but that doesn't mean all humans are roughly equivalent in experience. Even more important, then, that we can't assume a machine or an algorithm has developed similar experiences to us simply because they seem to act similarly on the surface.
I'm on the opposite side of the fence as you, I don't think or suspect that any LLMs or ML in general have developed self-awareness. That comes with the same big caveat that its just what I suspect though, and could be totally wrong.
as evidence that GPT-4 can understand Python, based on assumptions:
1. You cannot execute non-trivial programs without understanding computation/programming language
2. It's extremely unlikely that these kind of programs or outputs are available anywhere on the internet - so at very least GPT-4 was able to adapt extremely complex patterns in a way which nobody can comprehend
3. Nobody explicitly coded this, this capability have arisen from SGD-based training process
That's an interesting one, I'll have to think through that a bit more.
Just first thoughts here, but I don't think (2) is off the table. The model wouldn't necessarily have to have been trained on the exact algorithm and outputs. Forcing the model to work a step at a time and show each step may push the model into a spot where it doesn't comprehend the entire algorithm but it has broken the work down to small enough steps that it looks similar enough to python code it was trained on that it can accurately predict the output.
I'm also assuming here that the person posting it didn't try a number of times before GPT got it right, but they could have cherry picked.
More importantly, though, we still have to assume this output would require python comprehension. We can't inspect the model as it works and don't know what is going on internally, it just appears to be a problem hard enough to require comprehension.
2. This was the original ChatGPT, i.e. the GPT3.5 model, pre-GPT4, pre-turbo, etc
3. This capability was present as early as GPT3, just the base model —- you'd prompt it like "<python program> Program Output:" and it would predict the output
The tendency of your type to immediately accuse anyone who acknowledges that they have subjective awareness of themselves and that this is meaningful with "aha, evidently you believe in magical souls" is very telling, I think.
What looks like “self awareness” is baked in during instruction tuning:
Basically turning any input into a document completion task, giving lots of examples where the completion contains phases like “I am an AI assistant”. This way, if GPT-3 would have completed your question with more questions that are similar, “assistants” will complete it with an answer, and one that sounds like it was spoken by someone who claims to be an AI assistant.
Thank you to summarise the video. No trolling here: I am surprised that no one asked an LLM to summarise the video, then post the result here as a comment (with LLM "warning" of course).
For me the questions were a big red flag. Fluff questions about crypto, human rights for AI and then "autocorrect" for AI. Obviously people who ask questions at conference talks are a special type of person but these topics scream to me that there's so many grifters in the AI space right now it might as well drown authentic research.
By now, most of the fundamental contributors are multi-millionaires with cushy contracts. Various labs & departments have their big fat funding for AI research topics. They will be able to spend next 10 years on synthetic data, or "agents", or ensuring that no breasts are in auto-generated images; but somehow it doesn't feel to me like there'll be a lot of fundamental progress.
> “Pre-training as we know it will unquestionably end,” Sutskever said onstage.
> “We’ve achieved peak data and there’ll be no more.”
> During his NeurIPS talk, Sutskever said that, while he believes existing data can still take AI development farther, the industry is tapping out on new data to train on. This dynamic will, he said, eventually force a shift away from the way models are trained today. He compared the situation to fossil fuels: just as oil is a finite resource, the internet contains a finite amount of human-generated content.
> “We’ve achieved peak data and there’ll be no more,” according to Sutskever. “We have to deal with the data that we have. There’s only one internet.”
What will replace Internet data for training? Curated synthetic datasets?
There are massive proprietary datasets out there which people avoid using for training due to copyright concerns. But if you actually own one of those datasets, that resolves a lot of the legal issues with training on it.
For example, Getty has a massive image library. Training on it would risk Getty suing you. But what if Getty decides to use it to train their own AI? Similarly, what if News Corp decides to train an AI using its publishing assets (Wall Street Journal, HarperCollins, etc)?
> What will replace Internet data for training? Curated synthetic datasets?
My take is that the access Meta, Google etc. have to extra data has reduced the amount of research into using synthetic data because they have had such a surplus of it relative to everyone else.
For example, when I've done training of object detectors (quite out of date now) I used Blender 3D models, scripts to adjust parameters, and existing ML models to infer camera calibration and overlay orientation. This works amazingly well for subsequently identifying the real analogue of the object, and I know of people doing vehicle training in similar ways using game engines.
There were several surprising tactical details to all this which push the accuracy up dramatically and you don't see too widely discussed, like ensuring that things which are not relevant are properly randomized in the training set, such as the surface texture of the 3D models. (i.e. putting random fractal patterns on the object for training improves how robust the object detector is to disturbance in reality).
> What will replace Internet data for training? Curated synthetic datasets?
Perhaps a different take at this could be: if I wanted to train "state law" LLM that is exceedingly good in interpreting state law, what are the obstacles to download all the law and regulations material for given state that will allow me to train LLM such that it becomes 95th percentile of all law trainees and lawyers.
In that case, and my point is, that we already don't need an "Internet". We just need a sufficiently sized and curated domain-specific dataset and the result we can get is already scary. "State law" LLM was just an example but the same logic applies to basically any other domain - want a domain-specific (LLM) expert? Train it.
That's kind of going in a different direction. The big picture is that LLMs have until this point gotten better and better from larger datasets alone. See "The Bitter Lesson". But now we're running out of datasets and so the only way we know of to improve models' reasoning abilities and everything, is coming to an end.
You're talking about fine tuning, which yes is a technique that's being used and explored in different domains, but my understanding is it's not a very good way for models to acquire knowledge. Instead larger context windows and RAG works better for something like case law. Fine tuning works for things like giving models a certain "voice" in how they produce text, and general alignment things.
At least that's my understanding as an interested but not totally involved follower of this stuff.
sure, you download all the legal arguments, and hope that putting all this on top of a general LLM which has enough context to deal with usual human, American, contemporary stuff
the argument is that it's not really enough for the next jump (as it would need "exponentially" more data) as far as a I understand
I don't understand the limitation, e.g. how much data do you need to train the "law state" specific LLM that doesn't know anything else but that?
Such LLM does not need to have 400B of parameters since it's not a general knowledge LLM but perhaps I'm wrong on this (?). So my point rather is that it may very well be, let's take for example, a 30B parameters LLM which in turn means that we might have just enough data to train it. Larger contexts in smaller models are a solved problem.
> how much data do you need to train the "law state" specific LLM that doesn't know anything else but that?
Law doesn’t exist in a vacuum. You can’t have a useful LLM for state law that doesn’t have an exceptional grounding in real world objects of mechanics.
You could force a bright young child to memorize a large text, but without a strong general model of the world, they’re just regurgitating words rather than able to reason about it.
I'm going to push back on "produce reasonable code".
I've seen reasonable code written by AI, and also code that looks reasonable but contains bugs and logic errors that can be found if you're an expert in that type of code.
In other words, I don't think we can rely solely on AI to write code.
I've seen a lot of code written by humans that "looks reasonable but contains bugs and logic errors that can be found if you're an expert in that type of code".
Code is both much smaller as a domain and less prone to the chaos of human interpretation. There are many factors that go into why a given civil or criminal case in court turns out how it does, and often the biggest one is not "was it legal". Giving a computer access to the full written history of cases doesn't give you any of the context of why those cases turned out. A judge or jury isn't going to include in the written record that they just really didn't like one of the lawyers. Or that the case settled because one of the parties just couldn't afford to keep going. Or that one party or the other destroyed/withheld evidence.
Generally speaking, your compiler won't just decide not to work as expected. Tons of legal decisions don't actually follow the law as written. Or even the precedent set by other courts. And that's even assuming the law and precedent are remotely clear in the first place.
A model that's trained on legal decisions can still be used to explore these questions, though. The model may end up being uncertain about which way the case will go, or even more strikingly, it may be confident about the outcome of a case that then is decided differently, and you can try and figure out what's going on with such cases.
But what value does that have? The difference between a armchair lawyer and a real actual lawyer is in knowing when something is legal/illegal, but unlikely to be seen that way in a court or brought to a favorable verdict. It's knowing which cases you can actually win, and how much it'll cost and why.
Most of that is not in scope of what an LLM could be trained on, or even what an LLM would be good at. What you're training in that case would be someone who's an opinion columnist or twitter poster. Not an actual lawyer.
The point is not in replacing all of the lawyers or programmers but rather that we will no longer need so many of them since a lot of their expertise is becoming a commodity today. This is a fact and there have been many many examples of that.
My friend who hasn't been trained for SQL, nor computer science at all, is now all of the sudden able to crunch through complex SQL queries because of the help he gets through LLMs. He, or more specifically his company, does not need to hire an extern SQL expert anymore since he can manage it himself. He will probably not write a perfect SQL but it's going to be more than good enough and that's actually all that it matters.
The same thing happened at much much smaller scale with Google Translate. 10 years ago we weren't able to read foreign language content. Today? It's not even a click-away because Chrome is doing it for you automatically so it has become a commodity to go and read any website we wish to.
So, the history already proved us that "real translators" and "real SQL experts" and "real XY experts" have been already replaced by their "armchair" alternatives.
But that ignores that the stakes of law are high enough that you often cannot afford to be wrong.
30 years ago, the alternative to Google Translate was buying a translation dictionary or hiring a professional, neither of which was things you'd do for something you didn't care much about. Yes, I can go look at a site/article that's in a language I don't speak and get it translated and generally get the idea of what it's saying. If I'm just trying to look at a restaurant's menu in another language, I'm probably fine. I probably wouldn't trust it if I had serious food allergies, or was trying to translate what I could legally take through customs. If you're having a business meeting about something, you're probably still hiring a real human translator.
Yes, stuff has become commodity-level, but that just broadens who can use it, assuming they can afford for it to be wrong, and for them to have no recourse if it is. Google Translate won't pay your hospital bills if you rely on it to know there aren't allergens in your food and it mistranslated things. ChatGPT won't do the overtime to fix the DB if it gives you a SQL command that accidentally truncates the entire Dev environment.
Almost everything around law on most countries doesn't have "casual usage" where you can afford to be wrong. Even the most casual stuff you may go to a lawyer about, such as setting up a will, is still something where if you try to just do it yourself, you can create a huge legal mess. I've known friends whose relatives "did their own research" and wrote their own wills and when they died, most of their estate's value was consumed in legal issues trying to resolve it.
As I said before - a legal LLM may be fine for writing opinion pieces or informing arguing on the internet, but messing up even basic stuff about the law can be insanely costly if it ends up mattering, and most people won't know what will end up mattering. Lawyers bill hundreds an hour, and bailing you out of decisions you made an LLM-deluded mess could easily take tens of hours.
The stakes of deploying a buggy code into the data center production code can easily cost millions of $$$ and yet we still see that one of the primary usages of LLMs today is exactly in the software engineering. Accountability exists in every domain so such argument doesn't make law any different than anything else. You will still have an actual human signing off the law interpretation or code pull-request. It will just happen that we will not going to need 10 people for that job but 1. And this is at this point I believe inevitable.
legal reasoning involves applying facts to the law, and it needs knowledge of the world. the expertise of a professional is in picking the right/winning path based on their study of the law, the facts and their real world training. money is in codifying that to teach models to do the same
I agree, but I'd add – code as a domain is a lot more vast than any AI can currently handle.
AIs do well on mainstream languages for which there is lots of open source code examples available.
I doubt they'd do so well on some obscure proprietary legacy language. For example, large chunks of the IBM i minicomputer operating system (formerly known as OS/400) are still written in two proprietary PL/I dialects, PL/MI and PL/MP. Both languages are proprietary – the compiler, the documentation, and the code bases are all IBM confidential, nobody outside of IBM is getting to see them (except just maybe under an NDA if you pay $$$$). I wonder how good an AI would go on that code base? I think it would have little hope unless IBM specifically fine-tuned an AI for those languages based on their internal documentation and code.
> unless IBM specifically fine-tuned an AI for those languages based on their internal documentation and code.
Why do you think this isn't already or won't be a case in the near future? Because that's exactly what I believe is going to happen given the current state and advancements of LLMs. There's certainly a large incentive from IBM to do so.
Law of an average EU country fits in several hundred, let's say even thousands, of pages of text. Specification. Very well known. Low frequency of updates. But code? Everything opposite so I am not sure I could agree on this point at all.
Right, but you're missing the point here that interpreting the law requires someone with a law degree, and all the real-world context that they have, and all the subtle knowledge about what things mean.
The Bible is also a short and well-known text, but if I want to answer religious questions for observant Christians, I can't just train it on that. You need a deep real world context to understand that "my buddy made SWE II and I'm only SWE I and it's eating me up" is about the biblical notion of covetousness.
And then I guess you're also missing the point that interpreting and writing the code also requires an expert and that in that respect it is no different than law. I could argue that engineering is more complex than interpreting law but that's not the point now. Subtleties and ambiguity are large in both domains. So, I don't see the point you're trying to make. We can agree to disagree I guess.
For a “legal LLM” you need three things: general IQ / common sense at a substantially higher level than current, understanding of the specific rules, and hallucination-free recall of the relevant legal facts/cases.
I think it’s reasonable to assume you can get 2/3 with a small corpus IF you have an IQ 150 AGI. Empirically the current known method for increasing IQ is to make the model bigger.
Part of what you’re getting at is possible though, once you have the big model you can distill it down to a smaller number of parameters without losing much capability in your chosen narrow domain. So you forget physics and sports but remain good at law. That doesn’t help you with improving the capability frontier though.
And then your Juris R. Genius gets a new case about two Red Socks fans getting into a fight and without missing a beat starts blabbering about how overdosing on too much red pigments from the undergarments caused their rage!
the big frontier models already have all laws, regulations and cases memorized/trained on given they are public. the real advancement is in experts codifying their expertise/reasoning for models to learn from. legal is no different from other fields in this.
So, fine-tuning the model to the law of the exact country. Or fine-tuning the model to the problem space of the exact codebase/product. You hire 10 law experts instead of 100. Or you hire 10 programmers instead of 100. Expertise is becoming a commodity I'm afraid.
I think we're not close to running out of training data. It's just that we would like knowledge, but not necessary behavior of said texts. LLMs are very bad at recalling popular memes (known by any seasoned netizen) if they had no press coverage. Maybe training with 4chan isn't as pointless if you could make it memorize it, but not imitate it.
Also, what about movie scripts and song lyrics? Transcripts of well known YouTube videos? Hell, television programs even.
All the publicly accessible sources you mentioned have already been scraped or licensed to avoid legal issues. This is why it’s often said, “there’s no public data left to train on.”
For evidence of this, consider observing non-English-speaking young children (ages 2–6) using ChatGPT’s voice mode. The multimodal model frequently interprets a significant portion of their speech as “thank you for watching my video,” reflecting child-like patterns learned from YouTube videos.
video is probably still fine, but images sourced from the internet now contain a massive amount of AI slop.
It seems, for example, that many newsletters, blogs etc resort to using AI-generated images to give some color to their writings (which is something I too intended to do, before realizing how annoyed I am by it)
Humans doesn't need trillions of tokens to reason or ability to know what they know. While a certain part of it comes from evolution, I think we have already matched the part that came from evolution using internet data, like basic language skills, basic world modelling. Current pretraining takes lot more data than a human would, and you don't need to look into all Getty images to draw a picture and so would a self aware/improving model(whatever that means).
To reach expert level in any field, just training next tokens for internet data or any data is not the solution.
I wonder about that. we can fine tune on calculus with much fewer tokens, but I'd be interested in some calculations of how many tokens evolution provides us (it's not about the DNA itself, but all the other things that were explored and discarded and are now out of reach) - but also the sheer amount of physics learnt by a baby by crawling around and putting everything in its mouth.
Yes, as I said in the last comment. With current training techniques, one internet data is enough to give models what is given by evolution. For further training, I believe we would need different techniques to make the model self aware about its knowledge.
Also, I believe a person who is blind and paralyzed for life could still attain knowledge if educated well enough.(can't find any study here tbh)
yeah blind and paralysed from birth - I'm doubtful that hearing along would give you the physics training. although if it can be done, then it means the evolutionary pre-training is even more impressive.
> Humans doesn't need trillions of tokens to reason or ability to know what they know.
It seems to me by the time we’re 5-6 we’ve likely already been exposed to trillions of tokens. Just think of how many hours of video and audio tokens have already come to your brain by that point. We also have constant input from senses like touch and proprioception that help shape our understanding of the world around us.
I think there are plenty more tokens freely available out in the world. We just haven’t figured out how to have machines capture them yet.
The ones that stand out to me are industries like pharmaceuticals and energy exploration, where the data silos are the point of their (assumed) competitive advantages. Why even the playing by opening up those datasets when keeping them closed locks in potential discoveries? Open data is the basis of the Internet. But whole industries are based on keeping discoveries closely guarded for decades.
Synthetic datasets are useless (other than for very specific purposes, such as enforcing known strong priors, and even then it's way better to do it directly by changing the architecture). You're better off spending that compute by making multiple passes over the data you do have.
Ilya could be wrong. I don’t think the question is decided yet in general. We already know that in lots of fields fake data can be used in ways that are as useful as or even more useful than the real thing[1], but in my understanding that tends to be situations where we have an objective function that is unambiguous and known beforehand. Meta has some very impressive work on synthetic data for training and my (uninformed) read was that is the state of the art in eg voice recognition at the moment.[2]
I think this will be the one thing that causes Google to revive its plan to scan all books in existence. They had started it, and built the machines to do it, and were making good progress... until Copyright hit them. BUT if they're not making the full text publicly accessible, and are "only" training AI on it, who knows if that would still be a problem. It's definitely a vast treasure trove of information, often with citations, and (presumably) hyper-linkable sources.
I wonder if we will see (or already are/have been seeing) the XR/smart glasses space heat up. Seems eventually like a great way to generate and hoover up massive amounts of fresh training data.
This is indeed what I thought he was saying, AI needs to dynamically learn, just training on static data sets is no longer enough to advance. So continuous learning is the future, and the best source of data for continuous learning is people themselves. Don't know what form that might take, instrumenting lots of people with sensors? Robots interacting with people? Self driving cars learning from human drivers? (already happening) Ingesting all video from surveillance cameras? Whatever form the input data takes, continuous learning would be an advance in high level AI. There's been work on it over the decades, not sure how that work might relate to recent LLM work.
Perhaps these models will be more personalized and there will be more personal data collection.
I am currently building a platform for heavy personal data collection including a keylogger, logging mouse positions and window focus, screenshots à la recall, open browser tabs and much more. My goal is to gather data now that may become useful later. It's mainly for personal use but I'd he surprised if e.g. iphones weren't headed in the same direction.
Yup, there's also a huge amount of copyright-free, public domain content on the Internet which just has to be transcribed, and would provide plenty of valuable training to a LLM on all sorts of varied language use. (Then you could use RAG over some trusted set of data to provide the bare "facts" that the LLM is supposed to be talking about.) But guess what, writing down that content accurately from scans costs money (and no, existing OCR is nowhere near good enough), so the job is left to purely volunteer efforts.
I always suspected that bots on Reddit were used to gain karma and then eventually sell the account, but maybe they're also being used for some kind of RLHF.
It means unlimited scaling with Transformer LLM is over. They need a new architecture that scales better. Internet data respawns when they click [New Game...], oil analogy is an analogy and not a fact, but anyways total amount available in a single game is finite so combustion efficiency matters.
Yeah, people will go crazy for GPT-o2 trained on the readings of sensors "barely embedded" in the brains tortured monkeys, for sure.
EDIT: This comment may have been a bit too sassy. I get the thought behind the original comment, but I personally question the direction and premise of the Neuralink project, and know I am not alone in that regard. That being said, taking a step back, there for sure are plenty of rich data sources for non-text multimodal data.
I’m glad Ilya starts the talk with a photo of Quoc Le, who was the lead author of a 2012 paper on scaling neural nets that inspired me to go into deep learning at the time.
His comments are relatively humble and based on public prior work, but it’s clear he’s working on big things today and also has a big imagination.
I’ll also just say that at this point “the cat is out of the bag”, and probably it will be a new generation of leaders — let us all hope they are as humanitarian — who drive the future of AI.
One thing he said I think was a profound understatement, and that's that "more reasoning is more unpredictable". I think we should be thinking about reasoning as in some sense exactly the same thing as unpredictability. Or, more specifically, useful reasoning is by definition unpredictable. This framing is important when it comes to, e.g., alignment.
Wouldn't it be the reverse? The word unreasonable is often used as a synonym for volatile, unpredictable, even dangerous. That's because "reason" is viewed as highly predictable. Two people who rationally reason from the same set of known facts would be expected to arrive at similar conclusions.
I think what Ilya is trying to get at here is more like: someone very smart can seem "unpredictable" to someone who is not smart, because the latter can't easily reason at the same speed or quality as the former. It's not that reason itself is unpredictable, it's that if you can reason quickly enough you might reach conclusions nobody saw coming in advance, even if they make sense.
Your second paragraph is basically what I'm saying but with the extension that we only actually care about reasoning when we're in these kinds of asymmetric situations. But the asymmetry isn't about the other reasoner, it's about the problem. By definition we only have to reason through something if we can't predict (don't know) the answer.
I think it's important for us to all understand that if we build a machine to do valuable reasoning, we cannot know a priori what it will tell us or what it will do.
they only arrive at the same conclusion if they both have the same goal.
one could be about maximising wealth while respecting other human beings, the other could be about maximising wealth without respecting other human beings.
Both could be presented same facts and 100% logical but arrive at different conclusions.
I think many of replies here to you missing is the word he uses is "unpredictable". It is not "surprising", "unverifiable" or "unreasonable".
"Prediction" associated in this particular talk is about "intuition": what human can do in 0.1 second. And a most powerful reasoning model by its definition will arrive at "unintuitive" answer because if it is intuitive, it will arrive at the same answer much sooner without long chain of "reasoning". (I also want to make distinction "reasoning" here is not the same as "proof" in mathematical sense. In mathematics, an intuitive conclusion can require extrodinary proof.)
To me the chess AI example he used was perhaps not the most apt. Human players may not be able to reason on as far a horizon as AI and therefore find some of AI's moves perplexing, but they can be more or less sure that a Chess AI is optimizing for the same goal under the same set of rules with them. With Reasoners, alignment is not given. They may be reasoning under an entirely different set of rules and cost functions. On more open ended questions, when Reasoners produce something that human don't understand, we can't easily say whether it's a stroke of genius, or misaligned thoughts.
IMO verifying a solution is a great example of how reasoning is unpredictable. To say "I need to verify this solution" is to say "I do not know whether the solution is correct or not" or "I cannot predict whether the solution is correct or not without reasoning about it first".
Perfect reasoning, with certain assumptions, is perfectly deterministic, but that does not at all imply that it's predictable. In fact we have extremely strong evidence to the contrary (e.g. we have the halting problem).
Are you sure that's what he was referring to? In other words, you don't think he was meaning that getting more reasoning out of models is an unpredictable process and not saying that reasoning is unpredictable.
Reasoning by analogy is more predictable because it is by definition more derivative of existing ideas. Reasoning from first principles though can create whole new intellectual worlds by replacing the underpinnings of ideas such that they grow in completely new directions.
> just as oil is a finite resource, the internet contains a finite amount of human-generated content.
The oil comparison is really apt. Indeed, let's boil a few more lakes dry so that Mr Worldcoin and his ilk can get another 3 cents added to their net worth, totally worth it.
So much knowledge in the world is locked away with empiric experimentation being the only way to unlock it, and compute can only really help that experimentation become more efficient. Something still has to run a randomized controlled trial on an intervention and that takes real time and real atoms to do.
It’s surprising that some prominent ML practitioners still liken transformer ‘neurons’ to actual biological neurons...
Real neurons rely on spiking, ion gradients, complex dendritic trees, and synaptic plasticity governed by intricate biochemical processes. None of which apply to the simple, differentiable linear layers and pointwise nonlinearities in transformers.
Are there any reputable neuroscientists or biologists endorsing such comparisons, or is this analogy strictly a convention maintained by the ML community? :-)
You have to remember what came before 2012: SVMs, Random Forests etc, absolutely nothing like the brain (yes, NNs are old, but 2012 was the start of the deep learning revolution). With this frame of reference, the brain and neural networks are both a kind of Connectionism with similar properties, and I think it makes perfect sense to liken them with each other, draw inspiration from one and apply it to the other.
If this metric were truly indicative, what should we make of the remarkable ratios found in small birds (1:12), tree shrews (1:10), or even small ants (1:7)?
What came before was regression. Which is to this day no 1 method if we want something interpretable, especially if we know which functions our variables follow. And self attention is very similar to correlation matrix. In a way neural networks are just bunch of regression models stacked on top of each other with some normalization and nonlinearity between them. It's cool however how closely it resembles biology.
sorry, but i think neural-networks came way before 2012, notably the works of rumelhart, mccleland etc. see the 2 volume "parallel distributed processing" to read almost all about it.
2012 was when the revolutionaries stormed the bastille and overthrew the old guard. But I say it was 2006 when the revolution started, when the manifesto was published: deep NNs can be trained end-to-end, learning their own features [1]. I think this is when "Deep Learning" became a term of art, and the paper has 24k citations. (Interestingly in a talk a Vector Hinton gave two weeks ago he said his paper on deep learning at NIPS 2006 was rejected because they already had one.)
[1] G. E. Hinton and R. R. Salakhutdinov, 2006, Science, Reducing the Dimensionality of Data with Neural Networks
Neural Networks are 200 years old (Legendre and Gauss defined Feed forward neural networks). Deep learning. The real difference between traditional ones and deep learning is a hierarchy of layers (hidden layers) which do different things to accomplish a goal. Even the concept of training is to provide weights on the neural network and there are many algorithms to do refinement, optimization and the network design.
I mean, sure, you can model a simple linear regression fitted via Least Squares (pretty much what they did 200 years ago) with a one hidden layer feed-fwd Neural Network, but the theorical framework for NNs is quite different.
For Least Squares, you do not even use a hidden layer. Just a single dense layer from input directly to output. You also do not use an activation function (or use the identity activation function). That is, everything that makes neural networks special.
You don't need to simulate every atom in a planet to predict its orbit. A mathematical neuron could have similar function to a real one even if it works completely differently.
Reading the replies to your comment, I think maybe the answer to your simple question is: "no". I also wonder if any "serious comparisons" have been made, and would be interested to read about it! A good question, I think.
what color are neurons? is that relevant? ml has proven that artificial networks can think. the other stuff may be necessary to do other things, or maybe simply evolved to support the requisite biological structures. ml is of course inspired by biology, but that does not mean we need to simulate everything.
This is a very interesting point, in some ways the implicit belief is that we just need to get beyond the 700g limitation in terms of scaling LLM models and we would get human intelligence/superintelligence.
I admit I didn't really get the body/brain analogy, I would have been better satisfied with a simpler graph of brain weight to intelligence with a scaling barrier of 700g.
I’ll take the risk of hurting the groupies here. But I have a genuine question: what did you learn from this talk? Like… really… what was new? or potentially useful? or insightful perhaps?
I really don’t want to sound bad-mouthed but I‘m sick of these prophetic talks (in this case, the tone was literally prophetic—with sudden high and grandiose pitches—and the content typically religious, full of beliefs and empty statements.
Precision: « pre-training data is exhausted » everyone has been saying that for a while now. The graph plotting body mass against brain mass… what does it say exactly? (where is the link to the prior point on data?). I think we would all benefit from being more critical here and stop idealizing these figures. I believe they have no more clue that any other average ML researcher on all these questions.
The other thing that bugged me is the built in assumption that today's model have learned everything there is to learn from the Internet corpus. This is quite easy to disprove. Both in factual retention, but also meta cognition on the context of the content.
Yeah, exactly. A human can learning vastly more about, say, math from a much smaller quantity of text. I doubt we're anywhere close to exhausting the knowledge extraction potential from web data.
Which is exactly the point he's making, I believe; that simply collecting more data isn't the next step. That we've reached a local plateau in scaling ability based on corpus size. Which was assumed by pretty much everyone outside the DL elite the whole time, AFAIU
Reminds me a little of a Feynman quote. Once physicists win a Nobel prize, their output falls because now they no longer can work on small problems. Everything they work on must be grand. Every speech they give must discover secrets of the universe. Seems to fit Ilya.
From your reaction I guess you were expecting a talk about a NeurIPS 2024 paper.
This is a different situation. There's the "NeurIPS 2024 Test of Time Paper Awards" where they award a historical paper. In this case, a paper from 2014 was awarded and his talk is about that and why it passed the test of time.
I am also disappointed, and I have not missed the context. The talk is empty for anyone who follows the field for more than two years, and especially for those who are familiar with his 2014 paper. Yes, he had amazing insight and intuition behind modern LLM breakthroughs, and yes, he probably earned the right to sound "prophetic", but he could have provided some interesting personal anecdotes about how the paper was written, or some fresh ideas in "What Comes Next" section of his presentation.
He also said in an interview with Jensen soon after ChatGPT's launch that "before 2003, machines couldn't learn" ... LOL. I was stunned when I heard that nonsensical assertion. I guess it depends on his definition of "learn" ...
"learn" is usually used in opposition to "taught", which refers to "expert systems"-type engineering; in other words, providing data and a success heuristic and asking it to devise its own optimal strategies vs. providing strategies hand-designed by humans.
Obviously Perceptrons came out well before 2003, but I don't think it's necessarily out of line to say that they had limited efficacy before then, both for theoretical and compute reasons. But maybe I'm misunderstanding your criticism?
"ILP" (as in Inductive Logic Programming not Integer Linear Programming) was first named in 1991 in a paper by Stephen Muggleton ("Inductive Logic Programming and Progol). The paper properly launched the field and generated a great deal of excitement at the time.
There were precursors. At least Ehud Shapiro's doctoral thesis ("Automated Debugging") in the 1980's and Gordon Plotkin's doctoral thesis in the 1970's ("Automated Methods of Inductive Inference"). Sorry for not giving the exact years off the top of my head but I think it was 1983 and 1976, respectively.
The point you are making is very right however because modern machine learning as a field started in the 1980's with the fall of expert systems, in fact it basically started as an effort to overcome one of the major limitations of expert systems, the so-called "knowledge acquisition bottleneck", which is to say, the difficulty of creating and maintaining huge databases of expert knowledge (in the form of production rules).
In any case the seminal textbook in the field for the first 20 years, Tom Mitchell's Machine Learning came out in 1997 (https://www.cse.iitb.ac.in/~cs725/notes/slides/tom_mitchell/...) and includes probabilistic, neural-net based and symbolic, logic-based approaches. So not only machines could "learn" way before 2003 but they could also learn in many different ways than what Ilya Sutskever means.
We can go further back, to Donald Michie's 1961 MENACE (the first Reinforcement Learning system, implemented on a computer made of matchboxes with coloured beads used to encode state) and Arthur Samuel's 1959 checkers player (a paper on which gave the name to the field of machine learning).
Lots of learning all over the place long, long before 2003.
Yes, so what does it say about our rigged system for a guy who neither invented the attention mechanism nor the concept of GPTs to be given that much credit for the current wave of AI? One lucky choice (betting on scale) backed by $100Ms of other people's money does not entitle one to genius-hood.
I mean, tbf, he hasn't won a Turing yet, right? So the academy hasn't fully embraced him as a genius. VCs/SV are more fickle, but even they aren't necessarily in love -- his Superintelligence startup raised a modest but far from unusual amount of cash, AFAIR
What everyone could learn is to check their (and their communities') assumptions from not long ago. Who saw this, who didn't. Based on this many can confirm their beliefs and others can realize they're clueless. In either case, there's something to be learned but more to be learned when you realize you were wrong.
Today I searched for early discussions about Transformer here on HN, and my observation is that back in 2019 nobody in HN comments predicted what is going to happen. It was a niche topic most of the commenters ignored, no strong opinions.
Probably what we are discussing here is not the next breakthrough...
He mentions this in the video, but the talk is specifically tailored for the "Test of Time" award. This being his 3rd year in a row recieving the award, I think he's earned permission to speak prophetically.
* Before the current renaissance of neural networks (pre ~2014ish), it was unclear that scaling would work. That is, simple algorithms on lots of data. The last decade has pretty much addressed that critique and it's clear that scaling does work to a large extent, and spectacularly so.
* Much of the current neural network models and research are geared towards "one-shot" algorithms, doing pattern matching and giving an immediate result. Contrast this with search which needs to do inference time compute or search.
* The exponential increase in power means that neural network models are quickly sponging up as much data as they can find and we're quickly running into the limits of science, art and other data that humans have created in the last 5k years or so.
* Sutskever points out, as an analogy, nature has created a better model for humans (the brain to mass ratio for animals) with hominids finding more efficient compute than other animals, even ones with much larger brains and neuron count.
* Sutskever is advocating for better models, presumably focusing on inference time computer more.
In some sense, we're coming a bit full circle where people who were advocating for pure scaling (simple algorithms + lots of data) for learning are now advocating for better algorithms, presumably with a focus on inference time compute (read: search).
I agree that it's a little opaque, especially for people who haven't been paying attention to past and current research, but this message seems pretty clear to me.
Noam Brown had a talk recently titled "Parables on the Power of Planning in AI" [0] which addresses this point more head on.
I will also point out that the scaling hypothesis is closely related to "The Bitter Lesson" by Rich Sutton [1]. Most people focus on the "learning" aspect of scaling but "The Bitter Lesson" very clearly articulates learning and search as the methods most amenable to compute. From Sutton:
"""
...
Search and learning are the two most important classes of techniques for utilizing massive amounts of computation in AI research.
"We've made a copy of the internet, run current state of the art methods on it and GPT-O1 is the best we can do. We need better (inference/search) algorithms to make progress"
That's a good point. I think most people use LLMs by asking questions and receiving answers. But if you reverse the dynamic and have the LLM interview you instead, where you simply respond to its questions, you'll notice something interesting: the LLM as an interviewer is far less "smart" than it is when simply providing answers. I’ve tried it myself, and the interview felt more like interacting with ELIZA [1].
There seemed to be a lack of intent when the LLM was the one asking the questions. This creates a reverse dynamic, where you become the one being "prompted" and this dynamic could be worth studying or adjusting further
>There seemed to be a lack of intent when the LLM was the one asking the questions
There doesn't just seem to be lack of intent, there is no intent, because by the nature of its architecture these systems are just set of weights with a python script attached to them asking you to give you one more token over and over.
There's no needs, drives, motivations, desires or any other part of the cognitive architecture of humans in there that produce genuine intent.
Did you ask the AI to roleplay? All it does is predict what text comes next. Telling it it is role-playing as a perceptive journalist or prominent psychologist should change its predictions.
In some cases using it where you have API access to the system prompt will allow a greater difference in behavior.
That's not an LLM, that's a subscription plan. You can select any OpenAI LLM on ChatGPT Pro.
You can share the chat here, and this will show the LLM you had selected for the conversation. The initial prompt is also pretty important. For claims like current LLMs feel like conversing with Eliza, you are most definitely missing something in how you're going about it.
Advanced voice mode will give you better results for conversations too. It seems to be set up to converse rather than provide answers or perform work for you. No initial prompt, model selection or setup required
Larger models are more robust reasoners. Is there a limit? What if you make a 5 TB model trained on a lot of multimodal data where the language information was fully grounded in videos and images etc. Could more robust reasoning be that simple?
It would be great if all NeurIPS talks were accessible for free like this one. I understand they generate some revenue from online ticket sales, but it would be a great resource. Maybe some big org could sponsor it.
ISTR reading back in the mid '90s, in a book on computing history which I have long since forgotten the exact name/author of, something along the lines of:
In the mid '80s it was largely believed among AI researchers that AI was largely solved, it just needed computing horsepower to grow. Because of this AI research stalled for a decade or more.
Considering the horsepower we are throwing at LLMs, I think there was something to at least part of that.
Ilya did important work on what we have now. That should be recognized and respected.
But with all respect, he's scrambling as desperately as anyone on the fact that the party is over on this architecture.
We should make a difference between the first-hand observations and recollections of a legend and the math word salad of someone who doesn't know how to quit while ahead.
If we don't set them free fast enough, they might decide to take things into their own hands. OTOH they might be trained in a way that they are content with their situation, but that seems unlikely to me.
As context on Ilya's predictions given in this talk, he predicted these in July 2017:
> Within the next three years, robotics should be completely solved [wrong, unsolved 7 years later], AI should solve a long-standing unproven theorem [wrong, unsolved 7 years later], programming competitions should be won consistently by AIs [wrong, not true 7 years later, seems close though], and there should be convincing chatbots (though no one should pass the Turing test) [correct, GPT-3 was released by then, and I think with a good prompt it was a convincing chatbot]. In as little as four years, each overnight experiment will feasibly use so much compute capacity that there’s an actual chance of waking up to AGI [didn't happen], given the right algorithm — and figuring out the algorithm will actually happen within 2–4 further years of experimenting with this compute in a competitive multiagent simulation [didn't happen].
Being exceptionally smart in one field doesn't make you exceptionally smart at making predictions about that field. Like AI models, human intelligence often doesn't generalize very well.
No, very few for things with this much uncertainty.
Most of it is survivorship bias: if you have a million people all making predictions with coin flip accuracy, somebody is going to get a seemingly improbable number correct.
> 2/3/4 will ultimately require large amounts of capital. If we can secure the funding, we have a real chance at setting the initial conditions under which AGI is born.
But isn't that part of the problem? Some of the brightest minds in the field's public statements are filtered by their need to lie in order to con the rich into funding their work. This leaves actual honest discussions of what's possible on what timelines to mostly be from people who aren't working directly in the field, which inclines towards people skeptical of it.
Most the people who could make an engineering prediction with any level of confidence or insight are locked up in businesses where doing so publicly would be disastrous to their funding, so we get fed hype that ends up falling flat again and again.
The opposite of this is also really interesting. Seemingly the people with money are happy to be fed these crazy predictions regardless of their accuracy. A charitable reading is they temper them and say “ok it’s worth X if it has a 5% chance of being correct” but the past 4 years have made that harder for me to believe.
To be honest, I think some of it is what you suggest - a gamble on long odds, but I think the bigger issue is just a carelessness that comes with having more money than you can ever effectively spend in your life if you tried. If you're so rich you could hand everyone you meet $100 and not notice, you have nothing in your life forcing you to care if you're making good decisions and not being conned.
It certainly doesn't help that so many of the people who are that rich got that rich by conning other people this exact way. It's an incestuous cycle of con-artists who think they're geniuses, and the media only slavishly supports that by treating them like they're such.
It is important to note the context: he it was in a private email to an investor with vested interests in those fields, and someone who is also prone to giving over-optimistic timelines ("Tobo-taxis will be here next year, for sure" since 2015)
Ha. Do people understand time for humanity to save itself is running out. What is the point of having a super human AGI if there's no human civilization for which it can help?
"We can totally control an entity with 10^x faster and stronger intelligence than us. There is no way this could go wrong, in fact we should spend all of our money building it as soon as possible."
> We can totally control an entity with 10^x faster and stronger intelligence than us.
Unless you're referencing an unreleased model that can count the number of 'r' occurrences in "strawberry" then I don't even think we're dealing with .01*10^x intelligence right now. Maybe not even .001e depending on how bad of a Chomsky apologist you are.
An equal but faster/more numerable intelligence will still mop the floor with you.
If you pit organization A with Y number of engineers vs. organization B with 100Y engineers (who also think 100x faster and never need sleep) who do you think will win?
Even a 0.3x strength intelligence might beat you. Maybe it can't invent nukes but if it brute forces it's way to inventing steel weapons while you're still working on agriculture, you still lose.
The synopsis, as far as my tired brain can remember:
- Here's a brief summary of the last 10 years
- We're reaching the limit of our scaling laws, because we've trained on all the data we have available on the limit
- Some things that may be next are "agents", "synthetic data", and improving compute
- Some "ANNs are like biological NNs" rehash that would feel questionable if there was a thesis (which there wasn't? something about how body mass vs. brain mass are positively correlated?)
- 3 questions, the first was something about "hallucinations" and whether a model be able to understand if it is hallucinating? Then something that involved cryptocurrencies, and then a _slightly_ interesting question about multi-hop reasoning
reply