If you combine >In the more general event that you have rewards for all objects ...

If you combine

>In the more general event that you have rewards for all objects (if not, the reward-producing function can output zero), you would perform this weight update on all objects

with the following justification of its properties then it kind of seems like the guarantee that it gets close to the maximum reward only holds if:

1. You know the rewards for all options and update the weights accordingly

2. You consider the rewards for all options not taken 0 (in which case you trivially got the maximum reward possible).

This kind of seems to limit its applicability somewhat, unless I'm missing something?