'Int-4 llama is not enough [0] - Int-3 and beyond' suggests 3-bit is best for models larger than ~10B parameters when combining binning and GPTQ.
[0] https://nolanoorg.substack.com/p/int-4-llama-is-not-enough-i...