It's not parallel, a framework, or a GPU feature. It's single-instruction-multiple-data (SIMD) which is used to speed up single threaded execution on a CPU when working with lists of numbers.
He found himself writing the NEON code in assembly entirely by hand because vector intrinsics didn't even expose CPU features he wanted to use—even in C, where vector intrinsics are CPU-specific.
Having access to SIMD is definitely better than not having it, but it really should be paired with good optimized implementations of things like BLAS and FFT libraries.