Hacker News new | past | comments | ask | show | jobs | submit login

No, 7B LLMs only need about 4GB of RAM.

There is extremely little quality loss from dropping to 4-bit for LLMs, and that “extremely little” becomes “virtually unmeasurable” loss when going to 8-bit. No one should be running these models on local devices at fp16 outside of research, since fp16 makes them half as fast as q8_0 and requires twice as much RAM for no benefit.

If a model is inadequate for a task at 4-bit, then there's virtually no chance it's going to be adequate at fp16.

Microsoft has also been doing a lot of research into smaller models with the Phi series, and I would be surprised if Phi3 (or a hypothetical Phi4) doesn’t show up at some point under the hood.




The original commenter mentioned 70B, not 7B.


I had already read the comment I was responding to, and they actually mentioned both.

Here's the exact quote for the 7B:

"Even running a 7B will take 14GB if it's fp16."

Since they called out a specific amount of memory that is entirely irrelevant to anyone actually running 7B models, I was responding to that.

I'm certain that no one at Microsoft is talking about running 70B models on consumer devices. 7B models are actually a practical consideration for the hardware that exists today.


They said: "Even running a 7B will take 14GB if it's fp16."

Which is correct, fp16 takes two bytes per weight, so it will be 7 billion * 2 bytes which is exactly 14GB.

They are probably aware that you could run it with 4 bit quantization (which would use 1/4 of the RAM) but explicitly mentioned fp16.


> > Since they called out a specific amount of memory that is entirely irrelevant to anyone actually running 7B models, I was responding to that.

> Which is correct, fp16 takes two bytes per weight, so it will be 7 billion * 2 bytes which is exactly 14GB.

As I said, it is "entirely irrelevant", which is the exact wording I used. Nowhere did I say that the calculation was wrong for fp16. Irrelevant numbers like that can be misleading to people unfamiliar with the subject matter.

No one is deploying LLMs to end users at fp16. It would be a huge waste and provide a bad experience. This discussion is about Copilot+, which is all about managed AI experiences that "just work" for the end user. Professional-grade stuff, and I believe Microsoft has good enough engineers to know better than to deploy fp16 LLMs to end users.


Makes sense, your argument is clear now, my mistake.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: