If the system took 3 days to solve a problem, how different is this approach than a bruteforce attempt at the problem with educated guesses? Thats not reasoning in my mind.
Because with AlphaGeometry it literally was just a feedback loop brute forcing over a known database of geometry axioms with an LLM to guide the guesses.
Here, from what I understand, it's instead a theorem prover + LLM backing it. General proofs have a much larger search space than the 2d geometry problems you see on IMO; many former competitors disparage geometry for that reason.