Hacker News new | past | comments | ask | show | jobs | submit login

I'm basing that number on https://github.com/jcjohnson/cnn-benchmarks. I'm sure the exact number depends on the cpu/gpu/algorithm.



im2col + sgemm on CPU as in Caffe for instance is really slow; you are heavily penalized for extra memory traffic and the sgemm tile sizes are probably not well tuned for the problem size at hand.

At the roofline of performance, the difference in both mem b/w and arithmetic throughput between CPU and GPU is only 5-10x (for fp32, Pascal fp16 is a different story of course), and proper implementations on the CPU will get you there.

https://github.com/Maratyszcza/NNPACK




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: