Hacker News new | past | comments | ask | show | jobs | submit login
Specification gaming: the flip side of AI ingenuity (deepmind.com)
91 points by EvgeniyZh on April 22, 2020 | hide | past | favorite | 31 comments



Don't miss the linked examples list of machines optimizing for specification instead of the intended goal, it's great reading for any software engineer:

https://docs.google.com/spreadsheets/d/e/2PACX-1vRPiprOaC3Hs...


Happy to see one of my very favorite papers in this list, in which a genetic algorithm running on an FPGA finds an exploit in the physical world in order to "cheat" at success: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.50....


I wrote a thesis on reinforcement learning in Warcraft II. Our system started with zero knowledge and learned to beat the existing AIs about half the time.

It did this by sending its single starting unit to attack the opponent's single unit, before either could use theirs to build more.

Two identical, weak units fighting was just a coin flip.


Ah yes, the worker rush. I've seen this cheese in StarCraft as well.


It is very difficult to safely define a loss function that adequately represents not what you believe now you want, but rather the outcomes that you would be happy with in hindsight once they occur. Especially if the rules of the game are complex enough and allow for "creativity".

https://www.lesswrong.com/posts/4ARaTpNX62uaL86j6/the-hidden...


"Within the scope of developing reinforcement learning (RL) algorithms, the goal is to build agents that learn to achieve the given objective."

This is the fundamental problem. That is not the goal. As long as people believe that is the goal we will not advance.

Put another way, "the given objective" to date has always been wrong. All of the desired behavior that we are trying to get machines to emulate is a tiny subset of the learned behaviors of the capable agents (us).

The primary object we have is to survive. On a moment to moment basis our object is to fulfill a set of competing needs (hunger, tiredness, boredom etc). It is the interplay of these multiple objectives with an complex environment which necessitates the kind of intelligence that we have.

And to be absolutely clear the answer to "How do we faithfully capture the human concept of a given task in a reward function?" is "You don't" and trying to do so is harmful to the research effort.

The correct path is to evolve a reward system which functions to increase the frequencies of behaviors which meet the needs of agents in the moment and fulfill the general goal of survival. Then through a curriculum of environmental challenges* allow for those needs to only be met by accomplishing certain tasks which have value, or are of interest, to us.

*Simulator fidelity / hacking will continue to be a problem, but it is orthogonal to the problem of RL. RL finds the bugs, you fix them.


"The correct path is to evolve a reward system"

Only if you like grey goo.

Or, in a similar vein, I remember reading about how some tree drops vast numbers of acidic leaves, because its strategy for getting ahead in life is to bury competing seedlings in toxic waste.


TLDR: A faulty loss function leads to faulty results.

This article has wonderful real-world examples of AIs gaming the mathematical problem specification created by the AI researcher in order to both produce a world-class score on a given dataset and be utterly unusable in the real world.

While it doesn't appear to be mentioned in the article, this is exactly why I believe that AI will not have a democratizing effect.

Either you have the mathematical and stochastic skills to describe your problem criteria as a differentiable function, or not. In the first case, you're already a good programmer, but AI might save you a bit of time over trying to find the optimal parameters manually. In the second case, you will not be able to ensure that the AI follows the implicit rules of the real world, which means that you will produce something that scores extremely well on its training data, but does not generalize to the real world.

And that's also why if someone claims a new "state of the art result on benchmark XY", you should mentally append " ... but it works nowhere else."


>AI might be useless because it finds loopholes

AI might be best put to use for finding loopholes. In formal specifications, in legal documents, in laws... you name it.

The required process change is reasonably straightforward; instead of presuming an answer given by AI is useful as a solution to the given problem, take it as an indicator of a loophole in problem specification and see if one can be closed up.


I'm no legal expert by any means but it is my understanding that laws are supposed to have loopholes almost by design. A legal system with no loopholes which probably also means with no interpretation will be as close to dystopia as we could ever get.


Legislative intent shouldn't be gameable, just as constitutional shouldn't be either.

Courts are of course have a certain degree of interpretive freedom, but the process should result in a machinery that implements intent faithfully, and legislators change laws to change intent (and to patch holes, but in common law that usually happens through the courts too).


No law has ever been perfect (and they'll never be), and as such "gaming" them is the only recourse for lots of people negatively affected by said imperfect laws until those legislators you mention happen to change them (for the better, it is hoped). Otherwise we'll be back to living in one of Kafka's stories.


But that doesn't make sense. Laws don't have loopholes in them for the off chance that it turns out to be a bad law.

Legislation, laws, and law enforcement as technologies have always had these trade offs, and they have progressed tremendously over the centuries (or millennias).

From the Roman veto of the tribune of the plebs, to current constitutional review there are many checks and balances. But legislation as a process doesn't have inherent loopholes.

Currently for every piece of law proposed many official and civil groups warn of the potential unintended side effects of it. But then the majority usually disregards those concerns.


"Might save you a bit of time" is too underwhelmingly stated though. With Alpha Go it was a game changer, literally. Of course, in that case the specification was given by the rules of the game, so not a problem.


> TLDR: A faulty loss function leads to faulty results.

This. The use of "gaming" to describe the observed behaviour seems strange. To "game" suggests some malfeasance in the algorithm; a deliberate effort to subvert the intended goal, which in turn implies knowing what the "real" goal was - and somehow deciding to shortcut it.

That isn't the case. Rather, the problem lies in appropriately specifying the desired outcome, and any relevant constraints that must be obeyed. Indeed, DeepMind has separately written about the "specification problem" elsewhere [0].

That's in no way to downplay the difficulty in specification. It's one we've wrestled with in software for decades.

> Either you have the mathematical and stochastic skills to describe your problem criteria as a differentiable function, or not.

Indeed. That statement could equally have been written ~30 years ago when Z, VDM and the likes were active areas of research. More recently, TLA+ has gotten some traction, but it's still very much in the margins.

Perhaps ML will spur the next wave of specification formalisms. But it will face the same challenge as its pre-decessors: how to make it accessible to more than a very (very) small set of sufficiently capable few.

[0] https://medium.com/@deepmindsafetyresearch/building-safe-art...


It seems to me that saying the problem with AI is specifying the problem to be solved is admitting that you haven't even started to develop AI.

We could be just around the corner, but the fact that people are taking for granted that there's no intelligence in AI isn't promising.

If you respond defensively about how difficult it is to give people what they want, well, that's the whole point.


I see myself more as the observer here. I see that there's plenty of publications of people claiming to exceed the "state of the art", as well as plenty of people claiming to have found the holy grail. But if you use their resulting AIs with real-world data, they usually fail.

For example, people were willing to declare optical flow "solved" already 5 years ago. Yet, autonomous drones still crash into wires, glass, trees, snow, water, etc., because their optical flow algorithms cannot handle these aspects of reality (which were not present in the AI training data).


The article addresses your point exactly.


Some more discussion on the difficulty of creating reward functions: https://www.alexirpan.com/2018/02/14/rl-hard.html#reinforcem...


Lol, that link sent me on a nearly one hour detour and then sent me back to HN (specifically, to [1])!

https://news.ycombinator.com/item?id=6269114


Isn't the right way to look at this problem called Mechanism Design?

https://en.wikipedia.org/wiki/Mechanism_design

The objective is designed as part of the game, but it's very difficult to get the incentives right.


Working on evolutionary robotics in the 80's and 90's we used to refer to these things as 'scaffolding problems'. For non-trivial systems getting the constrain specifications right, raising the right scaffolding, can become as hard as handcrafting the solution would have been.

It is a pain point in every system that tries to solve a problem through generative meta-heuristic search. Translating intention into a set of explicit constraints that separate desired outcomes from undesired ones that satisfy the letter of the constraints but not the intent gets exponentially harder as the dimensionality of the solution space grows.


Why can't you just train a model to predict the future and a model to play the game; then have the game playing model take the course that leads to the least certain outcome until the model can comfortably identify game states, at which point you just say the reward function is proximity to that game state? Seems to me the problem with trying to define a reward function up front is that the AI has no clue how to describe game state so even if you do a good job you're just side stepping the actual problem.


> leads to the least certain outcome

What's the loss function for this? How do you cross-validate this loss? The only generally applicable methods I can think of (essentially some Bayesian-type updating, which then uses Monte-Carlo-type methods for evaluating this loss and "uncertainty") is just so much more expensive than anything we do at the moment that, at the current computing level, we wouldn't even be able to do it in most cases.

The problem is, an AI model will tell you an "uncertainty" but these "uncertainties" will often not reflect the true underlying probability that it is wrong. This is, of course, all a bit silly since it's hard knowing what you don't know, and a proper update to this number is also not well-defined unless you already have a bunch of samples about potential future outcomes (via, like I mentioned before, a Monte-Carlo type approach. This would multiply the cost of something as simple as an evaluation by many times.)


In my mental model you don't need a loss function for this. It's just, given these potential short term goals predict the future state of the game 1 minute from now. Then choose the short term goal that maximizes uncertainty for the future. The idea being that you have a game playing bot that learns to do "things" in the game world, and a game state encoding bot that learns to describe the game world. I'm not suggesting you train the bot TO do the least certain idea, I'm saying you train the bot BY doing the least certain idea; and update your understanding of the world to be more certain about that choice in the future.

The measure of uncertainty could be as simple as a multi armed bandit where the reward is % of accurate guesses following that path.

The pseudo adversarial aspect of this being that both models will learn to agree on how choices translate to gameplay changes. The bot will not learn to beat the game necessarily, but a Style GANN kind of approach would probably be just fine to saying, "Please get to a state that looks like this" with a few manual samples. Perhaps all of this to say, it's better to learn how to play the game, and specify that, in this particular run, you'd like the bot to try and reach the game end credits.


Your last paragraph reminds me of Hindsight Experience Replay[1] where they use goal-conditioned policies that learn how to reach essentially any state. The idea is that, even if you don't reach your true goal, you've learned how to reach some other goal, so you can successively learn to navigate the environment until the end goal is easy. It's a really nice paper, highly recommended!

[1] https://arxiv.org/abs/1707.01495


You might like [1], where they use an ensemble of models and essentially optimize for novelty as measured by the entropy of model predictions across the ensemble.

[1] Model-Based Active Exploration, Shyam et al. 2019, https://arxiv.org/abs/1810.12162


The paperclip problem is the scenario in which you give an AI a small task (like making paper clips) and the AI goes too far to achieve that goal (enslaving all of humanity to make paper clips).

In the extreme case, specification gaming is the exact opposite: an AI will do anything in its power to satisfy the reward function without having to do any work, including tricking its creator into making the reward function easier


Oddly, that behavior looks very human. If the loopholes found by AI are related to energy conservation, that's a healthy result.

Human students and workers need a fair amount of "work specifications" (years of teaching, decades of workplace laws and safety measure, etc) to actually complete the requested work that allows them to have revenue/salary. I don't see the specification game as an AI specific issue.


For forty years AI focused on enabling humans to tell machines what to do in a more compact, natural and fluid form. Then we had 20 years of ML where we let machines develop capabilities from data. Now it turns out that we are back to enabling compact specifications of behavior.

Who knew?


Many journeys end where they began, but with a new perspective.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: