- The recent LLM are so huge that even inference cost is quite expensive. Companies which want to enrich e.g. search with AI but don't need full "chat" capabilities are already looking for alternatives which are cheaper to run even if a bit worse in capabilities (ignoring training cost).
- For the same reason specialized hardware for inference has been a thing for quite a while and is currently becoming more mainstream. E.g. google cloud edge TPUs are for mainly inference, so are many phones AI/Neural cores. I also wouldn't be surprised if the main focus for e.g. the recent AI cores in AMD graphic cards would be inference through you can use them for more then that.
- Both AMD and Intel might be less behind then it seems when it comes to training and especially inference. E.g. AMD has been selling somewhat successful GPU compute, just not to the general public. With OpenCL being semi abandoned this lead to them having close to no public mind share. Through with ROCm slowly moving to public availability and AI training being more consolidated on the internal architectures it uses this might change very well. Sure for research, especially of unusual AI architectures, Nvidea will probably still win for a long time. But for "daily" LLM training they probably soon will have serious competition, even more so for inference. Similar Intels new dedicated GPU architectures was made with AI training and inference in mind, so at least for inference I'm pretty sure they soon will be competitive, too.
- AI training has also become increasingly more professional, with increasingly more often a small number of quite high level frameworks being used. That means that instead of having to make every project work well with your GPU you now can focus on a few high level frameworks. Similar AI architectures widely used differ less extrema then in the past and have less often big changes. Putting both together it means it's today much easier to create hardware+driver which works for well most cases. Which can be good enough to compete.
Even with all that said Nvidea has massive mind share which will give them a huge boost and when it comes to bleeding edge/exotic AI research (not just the next generation of LLMs) they probably still will win out hugely. But LLMs is where the current money is, and as far as it seems generational improvements do not come with any massive conceptually architectural changes, but just better composition of the same (by now kinda old) building blocks.
- The recent LLM are so huge that even inference cost is quite expensive. Companies which want to enrich e.g. search with AI but don't need full "chat" capabilities are already looking for alternatives which are cheaper to run even if a bit worse in capabilities (ignoring training cost).
- For the same reason specialized hardware for inference has been a thing for quite a while and is currently becoming more mainstream. E.g. google cloud edge TPUs are for mainly inference, so are many phones AI/Neural cores. I also wouldn't be surprised if the main focus for e.g. the recent AI cores in AMD graphic cards would be inference through you can use them for more then that.
- Both AMD and Intel might be less behind then it seems when it comes to training and especially inference. E.g. AMD has been selling somewhat successful GPU compute, just not to the general public. With OpenCL being semi abandoned this lead to them having close to no public mind share. Through with ROCm slowly moving to public availability and AI training being more consolidated on the internal architectures it uses this might change very well. Sure for research, especially of unusual AI architectures, Nvidea will probably still win for a long time. But for "daily" LLM training they probably soon will have serious competition, even more so for inference. Similar Intels new dedicated GPU architectures was made with AI training and inference in mind, so at least for inference I'm pretty sure they soon will be competitive, too.
- AI training has also become increasingly more professional, with increasingly more often a small number of quite high level frameworks being used. That means that instead of having to make every project work well with your GPU you now can focus on a few high level frameworks. Similar AI architectures widely used differ less extrema then in the past and have less often big changes. Putting both together it means it's today much easier to create hardware+driver which works for well most cases. Which can be good enough to compete.
Even with all that said Nvidea has massive mind share which will give them a huge boost and when it comes to bleeding edge/exotic AI research (not just the next generation of LLMs) they probably still will win out hugely. But LLMs is where the current money is, and as far as it seems generational improvements do not come with any massive conceptually architectural changes, but just better composition of the same (by now kinda old) building blocks.