Does the approach scale at low scale though? Like, would this project only bear fruit when run at large scale?
Getting budgetary approval isn't easy for everyone. Especially with an unproven process. And even then, there could be a mistake in the pipeline. All that money down the drain.
Good question! RL (and ML generally) definitely works better as you add more scale, but I still feel that this particular work is roughly "grand challenge" level. You shouldn't expect to just try this out as your first foray :).
I will note this paragraph from the post:
> RL researchers (including ourselves) have generally believed that long time horizons would require fundamentally new advances, such as hierarchical reinforcement learning. Our results suggest that we haven’t been giving today’s algorithms enough credit — at least when they’re run at sufficient scale and with a reasonable way of exploring.
which is mostly about the challenge of longer time horizons (and therefore LSTM related). If your problem is different / has a smaller space, I think this is soon going to be very approachable. That is, we recently demonstrated training ResNet-50 for $7.50.
There certainly exist a set of problems for which RL shouldn't cost you more than the value you get out of it, and for which you can demonstrate enough likelihood of success. RL itself though is still at the bleeding edge of ML research, so I don't consider it unusual that it's unproven.
Getting budgetary approval isn't easy for everyone. Especially with an unproven process. And even then, there could be a mistake in the pipeline. All that money down the drain.