Hacker News new | past | comments | ask | show | jobs | submit login

They came up with the SuperGLUE benchmark because they found that the GLUE benchmark was flawed and too easy to game. There were correlations in the dataset that made it possible to get questions right without real understanding, and so the results didn't generalize.

Could the same thing happen again with the better benchmark due to more subtle correlations? These things are tough to judge, so I'd say wait and see if it turns out to be a real result.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: