First Apple has to prove they have competitive designs. Apple Silicon GPUs simply do not compete with the efficiency of Nvidia's GPU compute architecture: https://browser.geekbench.com/opencl-benchmarks
Apple's obsessive focus with raster efficiency really shot their GPU designs in the foot. It will be interesting to see if they adopt Nvidia-style designs or spend more time trying to force NPU hardware to work.
I think performance per watt is way in Apple's favor, but raw performance is not.
That said, an M4 Ultra (extrapolating from Max and Pro) would likely compete with my 3090, and with 192GB of memory (for 10x the amount it should cost) will out perform my 3x3090 AI server. And honestly, cost less than my 3 3090s + rest of the computer + electricity.
It won't outperform a bunch of A/H 100s (or even a single one, or any other cards in the enterprise realm) though, but it will cost an order of magnitude less than a single card.
Careful when comparing performance and efficiency. As a rough factor power increases quadratically as you increase clocks on a design, so you can quite easily make a high performance design low power by under-clocking it. The same is not true for the reverse.
Sorry, I was coming at this from the consumer side (since apple is a consumer product company). The majority of LLM use (by consumers) is in inference, not training, so I'd hazard to guess, the majority of people would rather have inference machines than training machine.
Apple's obsessive focus with raster efficiency really shot their GPU designs in the foot. It will be interesting to see if they adopt Nvidia-style designs or spend more time trying to force NPU hardware to work.