I had already read the comment I was responding to, and they actually mentioned ...

SushiHippie · 2024-05-21T17:21:33 1716312093

They said: "Even running a 7B will take 14GB if it's fp16."

Which is correct, fp16 takes two bytes per weight, so it will be 7 billion * 2 bytes which is exactly 14GB.

They are probably aware that you could run it with 4 bit quantization (which would use 1/4 of the RAM) but explicitly mentioned fp16.

coder543 · 2024-05-21T17:23:25 1716312205

> > Since they called out a specific amount of memory that is entirely irrelevant to anyone actually running 7B models, I was responding to that.

> Which is correct, fp16 takes two bytes per weight, so it will be 7 billion * 2 bytes which is exactly 14GB.

As I said, it is "entirely irrelevant", which is the exact wording I used. Nowhere did I say that the calculation was wrong for fp16. Irrelevant numbers like that can be misleading to people unfamiliar with the subject matter.

No one is deploying LLMs to end users at fp16. It would be a huge waste and provide a bad experience. This discussion is about Copilot+, which is all about managed AI experiences that "just work" for the end user. Professional-grade stuff, and I believe Microsoft has good enough engineers to know better than to deploy fp16 LLMs to end users.

Tankenstein · 2024-05-24T10:16:43 1716545803

Makes sense, your argument is clear now, my mistake.