In almost all real-time networks playing games, there's very high jitter in inpu...

almostarockstar · on June 1, 2016

Notice how the output (or the network) is stochastic, not a solid value. You could certainly tweak the output sampling function to reduce jitter.

Further training may also reduce this, but technically, since the jitter is not causing any reduction in the reward, it might not. The best approach would likely be to alter the reward system to discourage jittery play...but again, there is no point because it does not reduce fitness.

I suppose where this is important is in robotics where jittery movement might actually be dangerous, or wear down hardware. In that case, you could certainly use an output smoothing function and tweak the reward.

folli · on June 1, 2016

May be you could introduce a (small) penalty to every keystroke. This might select against unnecessary movement and thus reduce jitter.

Lionleaf · on June 1, 2016

Note that he in this pong example in particular every frame he gives the network and option to either go up or down, but no option for standing still. So it _has_ to be jittery.

But as the other people have commented, adding a small penalty on every move and giving it the option to stand still (along with some normalization?) might give a much smoother result.

tripzilch · on June 13, 2016

I went back to that part of the article several times, because like you, I found it a bit odd that there was no option to stand still.

The first part of the article definitely seems to give the impression that the choice is between UP or DOWN every frame.

But then a bit further on it was a bit more ambiguous and I could also interpret it as giving a probability for UP and a separate one for DOWN. Then it could also choose neither. But then it also could choose both, and you need a conflict resolution procedure (do neither, pick the one with highest probability, maybe just roll again?). Unless the actual game also has two buttons and you can just do whatever the game engine will do if you press both.

Another possibility might be to model the output a bit more like a human player would do it. First I'd change it into a series of timings + note-on/note-off commands (like MIDI), then perhaps add jitter to the timings (making sure the note-off doesn't jitter before the corresponding note-on). I've read that adding this kind of noise to a NN tends to improve its robustness, so that might help?

Most of those changes would happen as transformation step between the output layer and the simulation input, so I presume the learning algo itself can mostly stay the same. But there's probably a few snags to that as well.

karmapolice · on June 1, 2016

With further training I don't think it's possible: since those movements are not useful nor harmful, they will appear in winning and losing matches. A possible solution might be including some distance traveled metric in the reward function...

phreeza · on June 1, 2016

You could probably do that by introducing a regularization term on the activity. Possibly L1 because that will tend toward sparse outputs.