* Before the current renaissance of neural networks (pre ~2014ish), it was unclear that scaling would work. That is, simple algorithms on lots of data. The last decade has pretty much addressed that critique and it's clear that scaling does work to a large extent, and spectacularly so.
* Much of the current neural network models and research are geared towards "one-shot" algorithms, doing pattern matching and giving an immediate result. Contrast this with search which needs to do inference time compute or search.
* The exponential increase in power means that neural network models are quickly sponging up as much data as they can find and we're quickly running into the limits of science, art and other data that humans have created in the last 5k years or so.
* Sutskever points out, as an analogy, nature has created a better model for humans (the brain to mass ratio for animals) with hominids finding more efficient compute than other animals, even ones with much larger brains and neuron count.
* Sutskever is advocating for better models, presumably focusing on inference time computer more.
In some sense, we're coming a bit full circle where people who were advocating for pure scaling (simple algorithms + lots of data) for learning are now advocating for better algorithms, presumably with a focus on inference time compute (read: search).
I agree that it's a little opaque, especially for people who haven't been paying attention to past and current research, but this message seems pretty clear to me.
Noam Brown had a talk recently titled "Parables on the Power of Planning in AI" [0] which addresses this point more head on.
I will also point out that the scaling hypothesis is closely related to "The Bitter Lesson" by Rich Sutton [1]. Most people focus on the "learning" aspect of scaling but "The Bitter Lesson" very clearly articulates learning and search as the methods most amenable to compute. From Sutton:
"""
...
Search and learning are the two most important classes of techniques for utilizing massive amounts of computation in AI research.
"We've made a copy of the internet, run current state of the art methods on it and GPT-O1 is the best we can do. We need better (inference/search) algorithms to make progress"
* Before the current renaissance of neural networks (pre ~2014ish), it was unclear that scaling would work. That is, simple algorithms on lots of data. The last decade has pretty much addressed that critique and it's clear that scaling does work to a large extent, and spectacularly so.
* Much of the current neural network models and research are geared towards "one-shot" algorithms, doing pattern matching and giving an immediate result. Contrast this with search which needs to do inference time compute or search.
* The exponential increase in power means that neural network models are quickly sponging up as much data as they can find and we're quickly running into the limits of science, art and other data that humans have created in the last 5k years or so.
* Sutskever points out, as an analogy, nature has created a better model for humans (the brain to mass ratio for animals) with hominids finding more efficient compute than other animals, even ones with much larger brains and neuron count.
* Sutskever is advocating for better models, presumably focusing on inference time computer more.
In some sense, we're coming a bit full circle where people who were advocating for pure scaling (simple algorithms + lots of data) for learning are now advocating for better algorithms, presumably with a focus on inference time compute (read: search).
I agree that it's a little opaque, especially for people who haven't been paying attention to past and current research, but this message seems pretty clear to me.
Noam Brown had a talk recently titled "Parables on the Power of Planning in AI" [0] which addresses this point more head on.
I will also point out that the scaling hypothesis is closely related to "The Bitter Lesson" by Rich Sutton [1]. Most people focus on the "learning" aspect of scaling but "The Bitter Lesson" very clearly articulates learning and search as the methods most amenable to compute. From Sutton:
"""
...
Search and learning are the two most important classes of techniques for utilizing massive amounts of computation in AI research.
...
"""
[0] https://youtube.com/watch?v=eaAonE58sLU
[1] https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson...