Maybe i'm missing it (it's late), but nowhere in the article does it explain why...

daeken · on May 29, 2012

A random choice is needed to allow people to give rewards to options other than the dominant. I'm sure this doesn't have to be random -- and I'd be curious to see the logic behind the 10% choice -- but you have to have something that gives the other options a chance.

Makes me wonder if the 10% number couldn't be changed to something that's a function of the number of rewards total; the longer it runs, the less variation there is and the more confident you are in the choice made.

nerdo · on May 30, 2012

That would be Epsilon-decreasing strategy or VDBE:

http://en.wikipedia.org/wiki/Multi-armed_bandit#Semi-uniform...

jasonwatkinspdx · on May 30, 2012

Like many things in machine learning, this is a simplified version of Metropolis-Hastings.

azernik · on May 29, 2012

The problem is that, depending on your initial settings and early results, certain settings could get such low payoff estimates that they're never tried at all (let's say one gets to 50% after one trial, and the rest all stay above 75%). You want to make sure that your solver adequately explores all choices.

tshaddox · on May 29, 2012

Presumably you would want the randomness to give choices that got to 0% another chance. It seems like a better way could be to use the previous success ratios as a probability density function.