Yeah, I think the problem here is that trying to be a little bit smart kind of g...

conductrics · on May 30, 2012

Actually, you kind of are already in the RL space when using AB testing to make online decisions, you just may not be thinking of it that way. From Sutton & Barto "Reinforcement learning is learning what to do--how to map situations to actions--so as to maximize a numerical reward signal." That is exactly what you are doing when applying A/B style hypothesis testing to inform decisions in an online application. Plus, personally, I think A/B testing is, in a way, much harder to interpret, at least most folks interpret wrong, which isn't a knock, since it is provides a non-intuitive - at least to me ;) - result.

wpietri · on May 30, 2012

Maximizing a numerical reward signal is definitely not what we're doing when we do an A/B test.

We collect a variety of metrics. When we do an A/B test, we look at a all of them as a way of understanding what effect our change has on user behavior and long-term outcomes.

A particular change may be intended to effect just one metric, but that's in an all-else-equal way. It's not often the case that our changes affect only one metric. And that's great, because that gives us hints as to what our next test should be.

conductrics · on May 30, 2012

Well I guess you could be running a MANOVA or something to test over joint outcomes, but the AB test is over some sort of metric. I mean, when you set up an experiment, you need to have defined the dependent variable first. Now, after you have randomly split your treatment groups you can do post hock analysis, which I think is what you are referring to. But if you are optimizing, here needs to be some metric to optimize over. Of course at the end of the day the hypothesis test just tells you prob(data or greater| null=true) which I am not sure provides a direct path to decision making.