No, 7B LLMs only need about 4GB of RAM. There is *extremely little* quality loss...

Tankenstein · 2024-05-20T19:08:23 1716232103

The original commenter mentioned 70B, not 7B.

coder543 · 2024-05-20T19:11:44 1716232304

I had already read the comment I was responding to, and they actually mentioned both.

Here's the exact quote for the 7B:

"Even running a 7B will take 14GB if it's fp16."

Since they called out a specific amount of memory that is entirely irrelevant to anyone actually running 7B models, I was responding to that.

I'm certain that no one at Microsoft is talking about running 70B models on consumer devices. 7B models are actually a practical consideration for the hardware that exists today.

SushiHippie · 2024-05-21T17:21:33 1716312093

They said: "Even running a 7B will take 14GB if it's fp16."

Which is correct, fp16 takes two bytes per weight, so it will be 7 billion * 2 bytes which is exactly 14GB.

They are probably aware that you could run it with 4 bit quantization (which would use 1/4 of the RAM) but explicitly mentioned fp16.

coder543 · 2024-05-21T17:23:25 1716312205

> > Since they called out a specific amount of memory that is entirely irrelevant to anyone actually running 7B models, I was responding to that.

> Which is correct, fp16 takes two bytes per weight, so it will be 7 billion * 2 bytes which is exactly 14GB.

As I said, it is "entirely irrelevant", which is the exact wording I used. Nowhere did I say that the calculation was wrong for fp16. Irrelevant numbers like that can be misleading to people unfamiliar with the subject matter.

No one is deploying LLMs to end users at fp16. It would be a huge waste and provide a bad experience. This discussion is about Copilot+, which is all about managed AI experiences that "just work" for the end user. Professional-grade stuff, and I believe Microsoft has good enough engineers to know better than to deploy fp16 LLMs to end users.

Tankenstein · 2024-05-24T10:16:43 1716545803

Makes sense, your argument is clear now, my mistake.