>*MKL has faster routines and is completely free, but it won't work on ARM.* It ...

loosescrews · on Nov 11, 2020

Maybe, but note that this is the Intel MKL. A library developed and maintained by Intel. It is not a secret that Intel does this to support their ecosystem and have been caught intentionally crippling support for AMD processors in the past [1]. Intel has recently been adding better support for AMD processors [2], but many suspect that is intended to help x86 as a whole better compete with ARM. If it does get ported, it is highly unlikely to have competitive performance.

[1] https://news.ycombinator.com/item?id=24307596

[2] https://news.ycombinator.com/item?id=24332825

kxyvr · on Nov 11, 2020

Thanks for the links. If anyone is wondering about some of the hoops that need to be jumped through to make it work, here's another guide [1].

One question in case you or anyone else knows: What's the story behind AMD's apparent lack of math library development? Years ago, AMD and ACML as their high-performance BLAS competitor to MKL. Eventually, it hit end of life and became AOCL [2]. I've not tried it, but I'm sure it's fine. That said, Intel has done steady, consistent work on MKL and added a huge amount of really important functionality such as its sparse libraries. When it works, AMD has also benefited from this work as well, but I've also been surprised that they haven't made similar investments.

Also, in case anyone is wondering, ARM's competing library is called the Arm Performance Libraries. Not sure how well it works and it's only available under a commercial license. I just went to check and pricing is not immediately available. All that said, it looks to be dense BLAS/LAPACK along with FFT and no sparse.

[1] https://www.pugetsystems.com/labs/hpc/How-To-Use-MKL-with-AM...

[2] https://developer.amd.com/amd-aocl/

danieldk · on Nov 11, 2020

Eventually, it hit end of life and became AOCL [2]. I've not tried it, but I'm sure it's fine.

It's ok. I did some experiments with transformer networks using libtorch. The numbers on a Ryzen 3700X were (sentences per second, 4 threads):

OpenBLAS: 83, BLIS: 69, AMD BLIS: 80, MKL: 119

On a Xeon Gold 6138:

OpenBLAS: 88, BLIS: 52, AMD BLIS: 59, MKL: 128

OpenBLAS was faster than AMD BLIS. But MKL beats everyone else by a wide margin because it has a special batched GEMM operation. Not only do they have very optimized kernels, they actively participate in the various ecosystems (such as PyTorch) and provide specialized implementations.

AMD is doing well with hardware, but it's surprising how much they drop the ball with ROCm and the CPU software ecosystem. (Of course, they are doing great work with open sourcing GPU drivers, AMDVLK, etc.)

gnufx · on Nov 11, 2020

If you care about small matrices on x86_64, you should look at libxsmm, which is the reason MKL now does well in that regime. (Those numbers aren't representative of large BLAS.)

my123 · on Nov 11, 2020

A free version of the Arm Performance Libraries is available at:

https://developer.arm.com/tools-and-software/server-and-hpc/...

rurban · on Nov 12, 2020

> What's the story behind AMD's apparent lack of math library development?

I don't see a story. AMD supports a proper libm for gcc and llvm, has its own libm, BLAD, LAPACK, ... at https://developer.amd.com/amd-aocl/

Just their rdrand intrinsic is broken on most ryzens if you didn't patch it. Fedora firmware doesn't patch it for you.

gnufx · on Nov 11, 2020

You just run MKL from the oneapi distribution, and it gives decent performance on EPYC2, but basically only for double precision, and I don't remember if that includes complex.

ACML was never competitive in my comparisons with Goto/OpenBLAS on a variety of opterons. It's been discarded, and AMD now use a somewhat enhanced version of BLIS.

BLIS is similar to, sometimes better than, ARMPL on aarch64, like thunderx2.

pinewurst · on Nov 11, 2020

In what world will Intel port MKL - Intel intellectual property - to ARM? The whole purpose of Intel's software tools is as an enabler and differentiator for their architecture and specifically their parts.

Fnoord · on Nov 12, 2020

I don't know about this proprietary technology specifically, but Intel is a huge company with some FOSS friendliness. USB 4 is based on Thunderbolt 3, so I guess they licensed that one.

coldtea · on Nov 11, 2020

In a world where Intel already had licensed ARM and built it in the past:

https://newsroom.intel.com/editorials/accelerating-foundry-i...

chipotle_coyote · on Nov 11, 2020

That linked article from 2016 is about Intel's Custom Foundry program, which I'm fairly sure is for building chips under contract to other companies. It promotes that they have "access to ARM Artisan IP," but doesn't specifically mention an ARM version of MKL that I see. The list of compatible hardware Intel's page on MKL itself lists compatible processors and ARM is conspicuously absent:

https://software.intel.com/content/www/us/en/develop/tools/m...

And, this question on Intel's own forums from 2016 at least suggests that there wasn't an MKL version for ARM in the time frame of the article you're linking to, either:

https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Libr...

So, from what I can tell, while Intel is an ARM licensee and made ARM CPUs in the past, they haven't made their own ARM CPUs for years and there's no sign they ever made MKL for any ARM platform. Never say never, but I think the OP is basically right -- there's not a lot of incentive for Intel to produce one.

dfox · on Nov 11, 2020

Intel had sold most of the relevant ARM IP and product lines to Marvell in 2006.

stabbles · on Nov 11, 2020

MKL is heavily optimized for Intel microarchs and purposely crippled on AMD (I believe dgemm is fast, sgemm slow). I don't think MKL benefits from optimizing it for Apple Silicon, especially considering Apple ditched Intel's hardware.

fxtentacle · on Nov 11, 2020

No it won't. Mkl is an Intel toolkit, so they will surely not support Apple's move to dump Intel processors.