There are better approaches for tackling this problem (with 0-regret asymptotica...

There are better approaches for tackling this problem (with 0-regret asymptotically). You can take a look at the UCB (Upper Confidence Bound) algorithm, and you can do even more if you assume some continuity, e.g. what is commonly done is to assume that the whole distribution is from a Gaussian Processes. Many interesting ideas in the literature indeed :)