Hacker News new | past | comments | ask | show | jobs | submit login

I'm not sure how well codeforces percentiles correlate to software engineering ability. Looking at all the data, it still isn't. Key notes:

1. AlphaCode 2 was already at 1650 last year.

2. SWE-bench verified under an agent has jumped from 33.2% to 35.8% under this model (which doesn't really matter). The full model is at 41.4% which still isn't a game changer either.

3. It's not handling open ended questions much better than gpt-4o.




i think you are right now actually initially i got excited but now i think OpenAI pulled the hype card again to seem relevant as they struggle to be profitable

Claude on the other hand has been fantastic and seems to do similar reasoning behind the scenes with RL


The model is really impressive to be fair. It's just how economically relevant it is.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: