Uhhhh… It was trained on ARC data? So they targeted a specific benchmark and are...

forgottofloss · 2024-12-21T00:17:17 1734740237

Yeah, seriously. The style of testing is public, so some engineers at OpenAI could easily have spent a few months generating millions of permutations of grid-based questions and including those in the original data for training the AI. Handshakes all around, publicity for everyone.

ripped_britches · 2024-12-21T01:50:10 1734745810

They are running a business selling access these models to enterprises and consumers. People won’t pay for stuff that doesn’t solve real problems. Nobody pays for stuff just because of a benchmark. It’d be really weird to become obsessed with metrics gaming rather than racing to build something smarter than the other guys. Nothing wrong with curating any type of training set that actually produces something that is useful.