I think the article claiming "PyTorch has dropped the drawbridge on the CUDA moa...

I think the article claiming "PyTorch has dropped the drawbridge on the CUDA moat" is way over optimistic. Jest pytorch is widely used by researchers and by users to quickly iterate various over various ways to use the models, but when it comes to inference there are huge gains to be had by going a different route. Llama.cpp has showed 10x speedups on my hardware for example (32gb of gpu ram + 32gb of cpu ram)for models like falcon-40b-instruct, for much smaller models on the cpu (under 10b) I saw up to 3x speedup just by switching to onnc and openvino.

Apple has showed us in practice the benefits of CPU/GPU memory sharing, will AMD be able to follow in their footsteps? The article claims AMD has a design with up to 192gb of shared ram. Apple is already shipping a design with the same amount of RAM(if you can afford it). I wish them-and) success, but I believe they need to aim higher than just matching apple in some unspecified future.