Apple’s been doing this for a while with their separate GPUs and Neural Engines,...

ribit · 2024-02-07T13:18:41 1707311921

Apple has three hardware units for machine learning (if you disregard the regular CPU FP hardware): the neural engine (for energy efficient inference for some types of models), the AMX unit (for low-latency matmul and long vector), and the GPU (which does contain some hardware for making matmul more efficient). Apple is a bit of a unique case since all processors share the same memory hierarchy, so they can afford to keep things separately. But I wouldn’t say that Apple has anything comparable to Nvidia’s Tensor Cores: AMX probably comes closest but it’s integrated with the CPU L2 and the total throughout is much lower. Unless you want FP64, that’s is very good compared to mainstream Tensor Cores.