the dataset is the crucial bit of openAI. that takes a lot of time and money to make. So its perfectly possible for openAI to carry on innovating without these people.
But equally, it could turn to shit.
However, Sam isn't jesus, so its not like he's going to magically make another successful company.
I bet he'll train models on copious amounts of synthetic data made with GPT-4. There are lots of datasets in the open. That makes catching up easier.
No public facing model can be protected from data exfiltration and distillation. All deployed skills leak, your competition will replicate with less effort. And they only need to leak once and every subsequent model can inherit the skill. I think the first movers paid a high price for being first, and will quickly see their advantage erode. Latecomers will catch up and find AI easier to work with. The difference is made by the great fine-tuning datasets that are in the open, a growing lake of distilled abilities.
Another latecomer advantage is benefiting from significant innovation in the engineering part: flash attention, quantization, continuous batching, KV caching, LoRA, and more.
The new AI era will be more equalitarian. Catching up is much easier than discovering, and we can run AI privately, unlike search engines and social networks. You can't exploit SOTA advantage at scale. Being first is a fleeting advantage, the moment you go in the open everyone replicates.
Maybe one reason this is happening is because AI skills are very composable. Any addition to the skill repertoire already fits with other skills. This makes open sourcing skills very attractive. Of course, the datasets are what is being open sourced.
the dataset is the crucial bit of openAI. that takes a lot of time and money to make. So its perfectly possible for openAI to carry on innovating without these people.
But equally, it could turn to shit.
However, Sam isn't jesus, so its not like he's going to magically make another successful company.