Hacker News new | past | comments | ask | show | jobs | submit login

I also suspect as much, but obviously can't know for sure. IMHO it's intellectually lazy if not dishonest to benchmark against 3.5 and not make that fact clearly known upfront

A better benchmark would have had two entries for ChatGPT, showing both 3.5 and 4 results




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: