To the extend that systems like chat-GPT are valuable, I expect we'll have open source equivalents to GPT-7 within the next five years. The only "moat" will be training on copyrighted content, and OpenAI is not likely to be able to afford to pay copyright owners enough once the value in the context of AI is widely understood.
We might see SETI-like distributed training networks and specific permutations of open source licensing (for code and content) intended to address dystopian AI scenarios.
It's only been a few years since we as a society learned that LLMs can be useful in this way, and OpenAI is managing to stay in the lead for now, though one could see in his facial countenance that Satya wants to fully own it so I think we can expect a MS acquisition to close within the next year and will be the most Microsoft has ever paid to acquire a company.
MS could justify tremendous capital expenditure to get a clear lead over Google both in terms of product and IP related concerns.
Also, from the standpoint of LLMs, Microsoft has far, far more proprietary data that would be valuable for training than any other company in the world.
Retrospectively, a lot of the comments you made could also have been said of Google search as it was taking off (open source alternative, SETI-like distributed version, copyright on data being the only blocker), but that didn’t come to pass.
Granted the internet and big tech was young then, and maybe we won’t make the same mistakes twice, but I wouldn’t bet the farm on it
There's a ton of work in this area, and the reality is... it doesn't work for LLMs.
Moving from 900GB/sec GPU memory bandwidth with infiniband interconnects between nodes to 0.01-0.1GB/sec over the internet is brutal (1000x to 10000x slower...) This works for simple image classifiers, but I've never seen anything like a large language model be trained in a meaningful amount of time this way.
Maybe there is a way to train a neural network in a distributed way by training subsets of it and then connecting the aggregated weight changes to adjacent network segments. It wouldn't recover 1000x interconnect slowdowns, but might still be useful depending on the topology of the network.
We might see SETI-like distributed training networks and specific permutations of open source licensing (for code and content) intended to address dystopian AI scenarios.
It's only been a few years since we as a society learned that LLMs can be useful in this way, and OpenAI is managing to stay in the lead for now, though one could see in his facial countenance that Satya wants to fully own it so I think we can expect a MS acquisition to close within the next year and will be the most Microsoft has ever paid to acquire a company.
MS could justify tremendous capital expenditure to get a clear lead over Google both in terms of product and IP related concerns.
Also, from the standpoint of LLMs, Microsoft has far, far more proprietary data that would be valuable for training than any other company in the world.