I don't think the article fully applies to large language models (LLMs).

> Inference will Dominate, not Training

This rings true. While LLMs will be fine-tuned by many, fewer companies will train their own independent foundation models from scratch (which doesn't require a "few GPUs", but hundreds with tight interconnect). The inference cost of running these in applications will dominate in these companies.

> CPUs are Competitive for Inference

I disagree for LLMs. Running the inference still takes a lot of the type of compute that GPUs are optimized for. If you want to respond to your customers' requests with acceptable latency (and achieve some throughput), you will want to use GPUs. For "medium-sized" LLMs you won't need NVLink-level interconnect speeds between your GPUs, though.