Hacker News new | past | comments | ask | show | jobs | submit login

> the current bottleneck is figuring out how to get a model to train itself against its own output without human intervention

Isn't that what AlphaZero was known for? After all, the 'zero' part of the name was a play on 'zero-shot learning' which is the concept you're talking about. So we have a proof of existence already, and there's been recent papers showing progress in some mathematical domains (though simplified ones so far, such as Euclidean geometry - which does however feature complex proofs).




Yes, but AlphaZero is based on reinforcement learning, where there is a simple cost function to optimize. There hasn't been much progress in applying reinforcement learning to LLMs to get them to self improve. I agree with the quote that this will be necessary to get superhuman performance in mathematics, and Lean may very well play a role there since it can help provide a cost function by checking correctness objectively.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: