Hacker News new | past | comments | ask | show | jobs | submit login

> Meanwhile in machine learning, people also made the switch to FP16, BF16 and INT8 largely because of the memory wall

FP16 doesn't work any faster than mixed precision on Nvidia or any other platform(I have benchmarked GPUs, CPUs and TPUs). For matrix multiplication, computation is still the bottleneck due to N^3 computation vs N^2 memory access.




With FP16 you can fit twice as much weights in cache, and also fetch twice as much weights from memory

Also this depends on the size of the matrix




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: