The latency here is dominated by converting the 2D image to a 1D wavefront that'...

sdenton4 · on July 13, 2019

This is the usual pipeline problem, though: sometimes the bottleneck is the CPU, and sometimes the bottleneck is memory bandwidth. This just places the ball firmly in the memory bandwidth court...

(You can have a hundred worker CPU cores doing the necessary conversions, but just need to worry about the parallelization complexity. But, then again, this is exactly what already happens when we feed data to hefty devices like GPUs and TPUs.)

daxfohl · on July 13, 2019