Hacker News new | past | comments | ask | show | jobs | submit login

If you combine

>In the more general event that you have rewards for all objects (if not, the reward-producing function can output zero), you would perform this weight update on all objects

with the following justification of its properties then it kind of seems like the guarantee that it gets close to the maximum reward only holds if:

1. You know the rewards for all options and update the weights accordingly

or

2. You consider the rewards for all options not taken 0 (in which case you trivially got the maximum reward possible).

This kind of seems to limit its applicability somewhat, unless I'm missing something?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: