The point was that very few companies can pull off a clean sheet chip design from nothing. Google has done it, but Google is an elite company. So saying Tesla isn't the only company that has done it because Google has done it only shows that Tesla is doing very well.
But it seems clear to me that the point isn't to compete with these other companies, but to vertically integrate a critical component of their systems. They can discard a lot of legacy concerns and focus on raw power.
The thing is that if Tesla just uses Nvidia, like everyone else, then Tesla's stuff is only differentiated by software. Everyone uses Nvidia and it becomes hard to set themselves apart. But if Nvidia is chasing a broad customer base and has all this extra stuff to think about, then Tesla could potentially be more nimble, and produce a holistic system design that solves exactly the problems they have with no chuff. This could result in a more advanced hardware platform so their robotics products differentiate themselves with both hardware and software.
I am also happy because I am a robotics engineer, and in my opinion we need hardware that is 1000x more powerful than today to do what we really need. Nvidia wants to move at a certain pace, but if Tesla is trying to beat them on raw power, then Nvidia will play catch up and accelerate development of more powerful systems. This is great for everyone.
Internally Tesla can use whatever they want and it might even make sense in the long term, but if they want to sell their chips for general model training they better be much better than the future Nvidia cards they will be competing with when they start selling. Like twice faster at half the price and half the power - with perfect framework support. That last part is extremely important: if I see any mentions of anyone who changed “cuda” to “dojo” in their Pytorch code and ran into any issues, I’m not going to touch it with a ten foot pole. Just like I avoid TPUs because 2 years ago I heard people were having issues with them. And I’m the guy who has decided which hardware to spend millions of dollars on at several companies.
Yeah I just don't think that is really the main part of their strategy. Maybe they would sell chips or boards or servers if they are already making them, but I think it is mostly about internal use so their end products have a competitive advantage as complete robots. Robotics needs HUGE advances in compute and with their own chips Tesla won't have to be dependent on a third party for their success.
All the stuff you talked about about needing perfect support before you will touch it is something that takes a lot of work for nvidia and others, slowing them down. Tesla can ignore all that and focus on performance for their specific application, and I think this gives them the freedom to lead the pack on raw performance for their application.
I'm not sure if you've watched the presentations for how their self driving system is trained, but basically they have a million vehicles out in the real world with camera systems on them, and they have a massive server farm that is collecting new data from vehicles all the time, and they train their neural net by running millions of scenarios in simulation and against real world data collected from all those vehicles. And they have to re-train the system all the time with new data, and run it against old data to check for regressions. So they have this huge compute requirement to keep improving their system. They think that functional self driving will revolutionize their business (setting aside the valid criticism, this is what Tesla thinks) so they need to be able to handle an ever growing compute load that they have to be running constantly. So raw compute power is critical to the success of their plan. It may not be enough, but if they certainly can't succeed without it. But their needs are very specific, and it sounds like they've found an architecture which is simpler than most nvidia chips, but has loads of power. So it sounds like they are making a good decision, based on their specific needs. It is a huge risky bet, but then that's how Musk likes to do things.
This is surprising to me. Robotics clearly needs huge advances in algorithms (RL or something better). Do you mean you need faster hardware to discover those algorithms?
Oh we definitely need better algorithms too! But I’ve imagined that we’d want something like GPT-3 but for sensor experiences. So the way GPT-3 can ingest lots of text and predict a reasonable paragraph as a continuation of a prompt, we could have a system ingest lots of simultaneous sensor data in the form of LiDAR, cameras, IMU data, and touch sensor/skin sensor data, and then given the current state of those sensors it could predict what in the world is going to occur next, and use this as input to an RL system to make a choice of action. This seems to me to be both a useful system and one that could require a lot of compute. And that’s still probably not a complete AI system so there’s probably many many pieces required.
Looking at it another way, the human brain has wayyy more compute power than any of our current portable computers (robotics really needs to do most of its compute at the edge, on robot). Every robot I’ve ever worked with has been maxing out it’s CPU and GPU and still needed more compute.
When you look at Tesla’s hydra network for their self driving system you get an idea for what is needed in robotics, but just as we saw GPT-3 improve with network size, I suspect a lot of the base systems involved in a hydra net could improve with network size. And I suspect that there’s still more advanced stuff required when you move beyond a simple task like driving a car to a more advanced general purpose AI system. For example the Tesla self driving system doesn’t need any language centers, and we know GPT-3 and similar networks are large.
robotics really needs to do most of its compute at the edge, on robot
Why can't you hook your robot up to a GPU cluster? Tesla already has 7k+ A100 GPUs, the question is do they have algorithms which would clearly work better if only we could run them on 70k or 700k GPUs?
I mean, what you say makes sense, but have people actually tried all that and realized they need bigger GPU clusters? Is there a GPT-3 equivalent model in robotics which would probably work better if we scaled it up? If not, perhaps they should start with a small proof of concept before asking for thousands of GPUs. Original Transformer --> GPT1 --> GPT2 --> GPT3.
The problem with this is that autonomous robots need to function in the real world even without internet connectivity. For example I am designing a solar powered farming robot. We do have Wi-Fi and starlink here but Wi-Fi and internet go down. In general we think it makes most sense for it to be able to operate completely without a continuous internet connection. And take self driving cars where millisecond response times matter - those can’t rely on a continuous connection to function or someone will get killed. But as systems get more advanced it is my opinion that edge compute will be an important function. And edge compute can’t handle the state of the art networks that people are building for even text processing, let alone complete autonomous AGI.
And no, the models I’m talking about don’t exist yet, I am speculating on what I think will be required based on how I see things. But I’m not asking for thousands of GPUs. I’m just speculating in internet comments about what robotics will need for a fully embodied AGI to function. I believe more edge compute power is needed. Perhaps 1000x more power than the Nvidia Orin edge compute for robotics today.
Sure, I understand the need for the robot autonomy. The problem, as I see it, is that current robots (autonomous or not) suck. They suck primarily because we don't have good algorithms. Without such algorithms or models, it does not matter whether a robot is autonomous or not, or how much processing power it has. Only after we have the algorithms, the other issues you mentioned might become relevant. Finally, it's not clear yet if we need more compute than what's currently available (e.g. at Tesla) to develop the algorithms.
p.s. I don't think AGI is needed for robotics. I suspect that even an ant's level of intelligence (together with existing ML algorithms) might be enough for robots to do almost all the tasks we might want them to do today. It's ironic that robots today can see, read, speak, understand spoken commands, drive a car (more or less), play simple video games, and do a lot of other cool stuff, but still can't walk around an apartment without falling or getting stuck, do laundry, wash dishes, or cook food.