> Meanwhile in machine learning, people also made the switch to FP16, BF16 and I... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

YetAnotherNick 9 months ago | parent | context | favorite | on: Intel x86 documentation has more pages than the 65...

> Meanwhile in machine learning, people also made the switch to FP16, BF16 and INT8 largely because of the memory wall

FP16 doesn't work any faster than mixed precision on Nvidia or any other platform(I have benchmarked GPUs, CPUs and TPUs). For matrix multiplication, computation is still the bottleneck due to N^3 computation vs N^2 memory access.

nextaccountic 9 months ago [–]

With FP16 you can fit twice as much weights in cache, and also fetch twice as much weights from memory

Also this depends on the size of the matrix

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact