Hacker News new | past | comments | ask | show | jobs | submit login

i would've been impressed by this 2 years ago. i think it's got to the point where real, valuable ai is in the hands of the everyday consumer, so we start judging the models for ourselves. having seen google continually get crushed over the past year, a bunch of benchmarks just fail to impress. in particular in this case, they're comparing their latest model to gpt4, which hasn't changed that much in almost a year.



Not only that, in some cases they’re comparing apples to oranges as well, undermining their credibility further. Eg chain-of-thought vs non-CoT results. I don’t even know why they’re doing that, seems like their results would be impressive enough even without this.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: