> Neglected to include comparisons against GPT-4-Turbo or Claude Opus, so I gues...

cjbprime · 2024-04-18T17:18:08

(You can't compare parameter count with a mixture of experts model, which is what the 1.8T rumor says that GPT-4 is.)

schleck8 · 2024-04-18T17:29:09

You absolutely can since it has a size advantage either way. MoE means the expert model performs better BECAUSE of the overall model size.

cjbprime · 2024-04-18T17:39:12

Fair enough, although it means we don't know whether a 1.8T MoE GPT-4 will have a "size advantage" over Llama 3 400B.