Hacker News new | past | comments | ask | show | jobs | submit login

> Neglected to include comparisons against GPT-4-Turbo or Claude Opus, so I guess it's far from being a frontier model

Yeah, almost like comparing a 70b model with a 1.8 trillion parameter model doesn't make any sense when you have a 400b model pending release.




(You can't compare parameter count with a mixture of experts model, which is what the 1.8T rumor says that GPT-4 is.)


You absolutely can since it has a size advantage either way. MoE means the expert model performs better BECAUSE of the overall model size.


Fair enough, although it means we don't know whether a 1.8T MoE GPT-4 will have a "size advantage" over Llama 3 400B.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: