>In the more general event that you have rewards for all objects (if not, the reward-producing function can output zero), you would perform this weight update on all objects
with the following justification of its properties then it kind of seems like the guarantee that it gets close to the maximum reward only holds if:
1. You know the rewards for all options and update the weights accordingly
or
2. You consider the rewards for all options not taken 0 (in which case you trivially got the maximum reward possible).
This kind of seems to limit its applicability somewhat, unless I'm missing something?
>In the more general event that you have rewards for all objects (if not, the reward-producing function can output zero), you would perform this weight update on all objects
with the following justification of its properties then it kind of seems like the guarantee that it gets close to the maximum reward only holds if:
1. You know the rewards for all options and update the weights accordingly
or
2. You consider the rewards for all options not taken 0 (in which case you trivially got the maximum reward possible).
This kind of seems to limit its applicability somewhat, unless I'm missing something?