Hacker News new | past | comments | ask | show | jobs | submit login

How is it a 1 bit LLM if 2 bits are required for each weight (and one of the 4 possible states is wasted to be able to represent 0)



As someone else pointed out here, you can store 5 ternary values in 1 byte, 3^5 == 243.


That’s still not 1 bit, and that would basically destroy whatever perf advantage you might hope to get if you want to keep the model in memory in that format rather than unpack it on load.


Not fully, 8 bits has 256 values. It's easy to keep a look up table in the L1 cache of any CPU and constant cache of any GPU. For ASICs and FPGAs, it's a simple 256-value LUT. It's not ideal, yes, but not a deal breaker. Epically considering LLMs are memory bound. GGML dequantizes weights on-the-fly and still gets near linear scaling on GPUs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: