It seems the trick here is they first quantize it to 1- or 2-bit, and then they ...

andy_xor_andrew · 2024-03-29T21:08:12.000000Z

Interesting, and it kinda makes sense. You quantize, which invariably means you lose some precision, but then you can finetune post-quantization to recover at least some of it. Neat idea.

jimmySixDOF · 2024-03-30T03:14:00.000000Z

Which is itself a little counterintuitive as the arxiv papers they cite say models need to be pretrained from the ground up using 1- or 2-bit (or 1.58bit). It definitely adds some interesting data points for the open source community who are experimenting in every possible direction.