> Now that being said, I think the GPU/SIMT model of vector computing is just mu...

> Now that being said, I think the GPU/SIMT model of vector computing is just much smarter.

I'm not sure. For an argument in favor of vectors, see https://riscv.org/wp-content/uploads/2015/06/riscv-vector-wo...

> Why let me jump through all these hoops of masking and compiler optimizations if all I want is a branch and an early exit for a specific set of values? GPU schedulers and drivers make this easy to use and with somewhat predictable performance results.

If the underlying hw is SIMD (vectors) and not SIMT anyway, as Nvidia hw apparently is, why should I have to go through the effort of rewriting my code in CUDA, and hope that some opaque driver will manage to turn that into efficient vector code?

I mean, ideally I'd just like to write C/C++/Fortran/Julia/Haskell/whatever code, and the compiler would autovectorize it.

> Furthermore (and probably more importantly), why is Intel putting this amount of compute power on a CPU without significantly upgrading memory bandwidth?

Flops are cheap, bw expensive. But yeah, certainly the are many applications that would benefit from a much better bw/flops ratio.

Then again, with the latest Teslas you have 16 GB with awesome bw, after that you're trying to feed the firehose through the PCIe straw.