Btw, Rust is big in 'crypto', and for eg Zero Knowledge Proofs we do a lot of Fourier transforms, too. And there's a few implementations in Rust for that.
This isn't really apples-to-apples comparing with FFTW.
1. It's been my experience that distros don't configure AVX properly for it, and
2. PhastFT takes its inputs de-interleaved in separate real/imaginary arrays which is generally not how complex data is provided, so that overhead doesn't appear in PhastFT.
One of the authors of PhastFT here. Thank you for your interest.
We went out of our way to configure FFTW for AVX-512. The Rust bindings don't do it, but the FFTW itself in the benchmark does.
It's worth noting that with FFTW you have to choose between building it for your CPU and making it non-portable, or targeting the lowest common denominator of CPU features so that it runs everywhere but much slower. Meanwhile PhastFT detects the available CPU features at runtime, and will utilize the fastest CPU features without sacrificing portability.
Lastly, we are currently working on support for interleaved format [1]. That should ship in the next release.
FFTW will definitely query cpuid at runtime too, since it's piecing together kernels anyways it's not much more work for it to choose to ignore AVX, etc. If you use the [guru interface](https://www.fftw.org/fftw3_doc/Guru-vector-and-transform-siz...) to configure it to work with split arrays (and maybe use FFTW_MEASURE when planning) I think the benchmarks will be a lot more 1:1
For those who are interested, there is a strong connection between FFT and quantum gates. Applying a gate to a target qubit in a quantum system follows the same computing pattern as one stage in FFT. Consequently, any quantum simulator contains an FFT implementation, and an efficient FFT implementation can be ported to a quantum simulator implementation.
See https://en.wikipedia.org/wiki/Discrete_Fourier_transform_ove...