Hacker News new | past | comments | ask | show | jobs | submit login

Because by making the model larger you don't need 64bit precision floats you only need 64 discrete bits.



Do you mind pointing out where they make the model larger? The paper seems to suggest they are maintaining the same model sizes.

> Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}. It matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: