It seems as if the people gobbling up the "Tesla has the data! Autopilot will keep getting better!" line have never trained a neural network in their life. Models converge. Loss stops decreasing, regardless of more incoming data. Extreme manual data cleaning effort becomes required to prevent overfitting. Model architecture has to change and hyper parameters have to be tweaked. Then you're back at square one as far as testing goes if you change any of those things.
The notion that Tesla's model HAS to keep improving simply because they will be able to pile on more (unlabeled!) data is laughably false. And, in fact, quite insulting to the intelligence of even the most casual ML engineers.
> And, in fact, quite insulting to the intelligence of even the most casual ML engineers.
Exactly, casual ML engineers. The issue of plateauing tends to occur because there is no more novelty to be had in the data. What mega-experiments like GPT and similar have shown us is that actually you can keep adding novel data and keep improving the model. Kinda inelegant, yet effective. The problem is, most institutions can't add more novelty beyond a certain scale, since that usually means shoveling more money at data storage and compute, on top of the novelty collection.
Tesla merely has to open the money tap to get more of both compute and storage, and let the real-time data flow in.
> Tesla merely has to open the money tap to get more of both compute and storage, and let the real-time data flow in.
And if you watch the other parts of the presentation, you'll see the bits about them buying clusters with 5k+ A100 GPUs. Presumably they intend to do something with those. Probably not streaming Fortnite concerts.
I would agree if their increase in data was linear, but it is increasing by orders of magnitude, which should have qualitative consequences for what they're able to accomplish as they claw their way through 9s. I don't see how it's possible to get progressively more 9s without scaling in both data and compute.
The point of the higher scale isn't just more data, it also makes it easier to solve the unbalanced data problem, because rarer and rarer scenarios will appear in large enough numbers to work with.
You make it sound extremely manual and sequential when reality is anything but.
A team with funds like Tesla, Google, FAIR is going to be using NAS and have a continuous testing pipeline. Tesla has arguably the best environment for continuous testing which is the most difficult part of improving a model. Andrej even said in his talk that their supercomputer is in the top 5 for FLOPs.
SOTA on ImageNet for the past few years has been driven by pre-training on massive datasets. Vision transformers are increasingly more common and are extremely data-hungry.
The notion that Tesla's model HAS to keep improving simply because they will be able to pile on more (unlabeled!) data is laughably false. And, in fact, quite insulting to the intelligence of even the most casual ML engineers.