Hacker News new | past | comments | ask | show | jobs | submit login

Apple did prefer to expose it through their own Accelerator.framework API however...



Has it been verified that they actually use these instructions in Accelerate.framework? I just benchmarked this on my 2019 intel i9 mbp, and got the following speeds for 128x128 matrices, 32 repeats:

  cblas_sgemm: 36 GFLOP/s
  vDSP_mmul: 41 GFLOP/s
That's a pretty big deal if these functions are >30x faster on the M1...!

edit: that seems to be verified in the tlkh.dev blog post above. Interestingly, I ran the same code on my bargain-basement 2020 iphone SE, and got 259GFLOP/s! These apple devices are pretty mindblowing.


Has it been verified that they actually use these instructions in Accelerate.framework?

Yes. Aside from benchmarks, you can easily verify this by profiling an application with Instruments and then inspecting the disassembly.

However, it should be said that AMX does not scale linearly with the number of cores, but with the number of core clusters. So, on the M1 if you use Accelerate in two threads (rather than one), performance will barely improve, because the first thread can keep the AMX unit busy enough.

However, e.g. the M1 Pro and M1 Max have two performance core clusters with AMX units in them. So matrix multiplication doubles roughly two times compared to the M1. Similarly, the M1 Ultra has fours performance core clusters, so matrix multiplication performance is roughly twice that of the M1 Pro/Max and four times that of the M1.

Benchmarks:

https://github.com/danieldk/gemm-benchmark#1-to-16-threads


Hey not related but you mentioned using kvm to run arm64 macOS on linux aarch64. I would like to give this a shot, but can't find a project for it. Would you mind sharing the deets?


Of course they do, Apple like to remain as much in control as possible. If suddenly it becomes more efficient/faster to run ML/AI stuff on Asahi Linux on Mac hardware then with macOS, I'm sure they be embarrassed enough to take some sort of action. And I'm pretty sure that action will be towards the side of "closing things down" rather than "opening stuff up", as is tradition.


Wrong answer.

AMX is an unstable ISA that changes between product generations. That's why it's not publicly documented.

Arm SME is the standardisation of the concept, but is not inmarket yet.

https://community.arm.com/arm-community-blogs/b/architecture...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: