It's not parallel, a framework, or a GPU feature. It's single-instruction-multip...

LnxPrgr3 · on June 12, 2014

My understanding is architectures are different enough that the fastest SIMD strategy is sometimes CPU-dependent.

The author of FFTS, for example, chose a different strategy on ARM than x86_64: http://anthonix.com/ffts/preprints/tsp2013.pdf

He found himself writing the NEON code in assembly entirely by hand because vector intrinsics didn't even expose CPU features he wanted to use—even in C, where vector intrinsics are CPU-specific.

Having access to SIMD is definitely better than not having it, but it really should be paired with good optimized implementations of things like BLAS and FFT libraries.

cma · on June 11, 2014

>Single instruction, multiple data (SIMD), is a class of parallel computers in Flynn's taxonomy