Hacker News new | past | comments | ask | show | jobs | submit login

For completeness, there's also another paper that demonstrated you get more power/accuracy per-bit at 4 bits than at any other level of precision (including 2 bits and 3 bits)



That's the paper I referenced. But newer research is already challenging it.

'Int-4 llama is not enough [0] - Int-3 and beyond' suggests 3-bit is best for models larger than ~10B parameters when combining binning and GPTQ.

[0] https://nolanoorg.substack.com/p/int-4-llama-is-not-enough-i...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: