Hacker News new | past | comments | ask | show | jobs | submit login

> The point is that Othello GPT is not in fact playing Othello by 'memorizing a long list of correlations' but by modelling the rules and states of Othello and using that model to make a good prediction of the next move.

I don't know how you rule out "memorizing a long list of correlations" from the results. The big discrepancy in performance between their synthetic/random-data training and human-data training suggests to me the opposite: random board states are more statistically nice/uniform and suggests that these are in fact correlations not state computations.

> This is evaluated in the actual paper with the error rates using the linear and non linear probes. It's not a red flag that a precursor blog wouldn't have such things.

It's the main claim/result! Presumably the reason it is omitted from the blog is that the results are not good: nearly 10% error per tile. Othello boards are 64 tiles so the board level error rate (assuming independent errors) is 99.88%.

> The multiple comparison problem is only a problem when you're trying to run multiple tests on the same sample. Obviously don't test your probe on states you fed it during training and you're good.

In practice what is done is you keep re-running your test/validation loop with different hyperparameters until the validation result looks good. That's "running multiple tests on the same sample".




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: