Good call on the categories. > The short version is that the Brier score is much...

Good call on the categories.

> The short version is that the Brier score is much better than .25 for AI questions, and the weighted Metaculus Prediction is more accurate still.

Added more categories. 1 year out is 0.217. I agree that's better than chance, though "much better"?

That said, this is dominated by bad community predictions pre-2020 and there's not much data recently for binary questions. I agree that CRPS is better - but it's not clear to me from that link how early they are looking at questions - accuracy gets better closer to resolve date -- I'm claiming that longer-term predictions are shakier.