Hacker News new | past | comments | ask | show | jobs | submit login

It'd be more accurate to call it a logarithmic relationship, since compute time is our input variable. Which itself is a bit concerning, as that implies that modest gains in accuracy require exponentially more compute time.

In either case, that still doesn't excuse not labeling your axis. Taking 10 seconds vs 10 days to get 80% accuracy implies radically different things on how developed this technology is, and how viable it is for real world applications.

Which isn't to say a model that takes 10 days to get an 80% accurate result can't be useful. There are absolutely use cases where that could represent a significant improvement on what's currently available. But the fact that they're obfuscating this fairly basic statistic doesn't inspire confidence.




> Which itself is a bit concerning, as that implies that modest gains in accuracy require exponentially more compute time

This is more of what I was getting at. I agree they should label the axis regardless, but I think the scaling relationship is interesting (or rather, concerning) on its own.


The absolute time depends on hardware, optimizations, exact model, etc; it's not a very meaningful number to quantify the reinforcement technique they've developed, but it is very useful to estimate their training hardware and other proprietary information.


It's not about the literally quantity/value, it's about the order of growth of output vs input. Hardware and optimizations don't really change that.


Exactly, that's why the absolute computation time doesn't matter, only relative growth, which is exactly what they show.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: