Hacker News new | past | comments | ask | show | jobs | submit login

It's not parallel, a framework, or a GPU feature. It's single-instruction-multiple-data (SIMD) which is used to speed up single threaded execution on a CPU when working with lists of numbers.



My understanding is architectures are different enough that the fastest SIMD strategy is sometimes CPU-dependent.

The author of FFTS, for example, chose a different strategy on ARM than x86_64: http://anthonix.com/ffts/preprints/tsp2013.pdf

He found himself writing the NEON code in assembly entirely by hand because vector intrinsics didn't even expose CPU features he wanted to use—even in C, where vector intrinsics are CPU-specific.

Having access to SIMD is definitely better than not having it, but it really should be paired with good optimized implementations of things like BLAS and FFT libraries.


>Single instruction, multiple data (SIMD), is a class of parallel computers in Flynn's taxonomy




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: