AlphaZero plays games of chess against itself over and over, feeding the output ...

_dain_ · on Feb 22, 2023

> AlphaZero plays games of chess against itself over and over, feeding the output of the neural network back into the input, and now it's vastly more powerful than any chess engine that's ever existed.

not a good comparison. alphazero's loss function never changed as it was playing itself. it was always just "win this game given these rules". but LLM loss function rewards it for predicting the next token. and now "the next token" might be something that an AI wrote previously.

SideQuark · on Feb 22, 2023

>alphazero's loss function never changed

Yes, it did.

Alpha zero's loss function changes with every update. The loss function works on the actual outcome versus the predicted outcome, and the predicted outcome is a result of previous learning. So with every step it takes, which are random (and the initial weights are also random), it modifies it's own loss function for future walks in the space of weights. Rerun the entire training with different initial random weights, and look at the sequence of loss functions, and you will get a different sequence.

The paper: https://arxiv.org/pdf/1712.01815.pdf

_dain_ · on Feb 23, 2023

i mean, there's always an objective ground truth because of the rules of chess never change. did it win or lose?

but the "ground truth" of the english corpus changes all the time. and is changing right now as LLMs emit words into the noosphere. so I don't see how this counters my point.

SideQuark · on Feb 23, 2023

>the rules of chess never change

Actually, they do under FIDE rules. For example, the 50 move rule has changed many times, even in my lifetime, to different numbers, and there is pressure to change it yet again. They also added a 75 move rule (50 requires player intervention, 75 is automatic).

They recently abolished the automatic draw for insufficient material rule.

They added a new "dead position" rule that forces a draw.

They recently removed a perpetual check draw rule.

They added an automatic fivefold repetition draw rule, to go along with the requiring claim for the threefold repetition rule.

If you don't like FIDE rules, then each national federation has rules that also change.

So claiming the rules never change is simply not true. The rules have changed many, many times, some in pretty big ways (see all the changes in promotion rules since 1800) in the past few hundred years, as well as in even the last decade. Google and read.

>there's always an objective ground truth

That "objective ground truth" is not computable. If it were, chess would be weakly solved (in the game theoretic sense), and it is not, and is expected to never be so. It's too complex. Since no AI can access the "objective truth" of a position, it's no different than what LLMs do - they are measuring next move under some fuzzy AI generated probability distribution over the next move (or token, if you prefer).

>so I don't see how this counters my point

You had a belief, and made a claim to rationalize it, and the claim was false. Usually that should cause to to rethink the belief, not double down on it.

That a game of chess ends with a ternary outcome is irrelevant since AlphaZero is not training on that uncomputable function - it's training on it's own predicted move versus a statistical sampling of the move quality. It never ever knows the "truth" of the outcome of a give position because that cannot be computed - it is far too big.

Your claim:

>but LLM loss function rewards it for predicting the next token. and now "the next token" might be something that an AI wrote previously

is no different than:

"but AlphaZero loss function rewards it for predicting the next move quality. and certainly the next move quality might be something AlphaZero estimated previously".

>is changing right now as LLMs emit words into the noosphere

Chess knowledge is also changing right now as AlphaZero emits new games and even new chess ideas into the noosphere (plenty of GMs have written on how they are rethinking certain positions, and this "knowledge" can be fed into newer engines/AIs as desired....) Not a lot of difference is there?

_dain_ · on Feb 23, 2023

you are bringing up irrelevant nitpicks and are seemingly intent on misunderstanding/misrepresenting my point. I'm not continuing this discussion.

SideQuark · on Feb 24, 2023

>intent on misunderstanding/misrepresenting my point

I completely addressed your point - you claim somehow there is a fundamental difference between LLMs and AlphaZero, and you made many claims about why. They were all demonstrably wrong, which is why you misunderstand that there is no fundamental difference, and certainly not the one you claim. Both learn using a fuzzy metric, both can reuse previous things from their own learning, and this is opposite what you claimed.