Hacker News new | past | comments | ask | show | jobs | submit login

Exactly. The smaller bit widths from quantization might marginally decrease the compute required for each operation, but they do not reduce the overall volume of operations. So, the effect of quantization is generally more impactful on memory use than compute.



Except in this case they quantized both the parameters and the activations leading to decreased compute time too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: