How is it a 1 bit LLM if 2 bits are required for each weight (and one of the 4 p...

ricardobeat · 2024-02-28T21:18:29.000000Z

As someone else pointed out here, you can store 5 ternary values in 1 byte, 3^5 == 243.

ein0p · 2024-02-28T22:00:39.000000Z

That’s still not 1 bit, and that would basically destroy whatever perf advantage you might hope to get if you want to keep the model in memory in that format rather than unpack it on load.

marty1885 · 2024-03-01T04:41:12.000000Z

Not fully, 8 bits has 256 values. It's easy to keep a look up table in the L1 cache of any CPU and constant cache of any GPU. For ASICs and FPGAs, it's a simple 256-value LUT. It's not ideal, yes, but not a deal breaker. Epically considering LLMs are memory bound. GGML dequantizes weights on-the-fly and still gets near linear scaling on GPUs.