Hacker News new | past | comments | ask | show | jobs | submit login

I think this article describes a quality of communications that many discussions of artificial intelligence seem to miss.

Intelligent disobedience is effectively a formalization of the way informal language-interactions often work. When you ask a person to do something, the response is can easily be a request for clarification, a comment on possible negative results, some suggestions about alternative approaches and so-forth. Often, you get a decision after a few rounds of this. Basically, a good portion of language interactions involve a bargaining and clarification process.

Now, consider the average "AI goes wrong" argument. The classic scenario is someone asks a general AI to "build a lot of paperclips" and, like Disney's Sorcerers Apprentice, the AI converts the entire earth into a paperclip factory. Here, the interactions between AI and human fail to be anything like human-to-human informal interactions. And this hypothetical scenario seems implausible just given that the scenario also posits vast understanding in the AI, an understanding which would seem to encompass language understanding such that AI could do that back-and-forth bargaining approach (the human might ask for such behavior to be avoided but theoretically we're talking the human that invented the AI and also has this sort of meta-understanding).




I think your missing the point of the paper clip maximiser example. All it’s for is to show how complex reasoning and intelligence are, and that strong AI has to be about a lot more than just solving problems.

Arguing that a strong AI would have to be smart enough not to make a mistake like that is really the purpose of the example.


The paperclip AI knows exactly what you want, it just doesn't care. Any naive way of trying to make the AI care will fail because we don't know how to design good measures that can't be hacked.


I think back-and-forth bargaining is an important and valuable concept. And while I think you are right, I suspect that the capability to bargain in response to commands would be fraught with its own category of concerns and potentials for undesired outcomes.

I also think the paperclip concept could be rehabilitated and treated with a charitable/steelman interpretation, where it's regarded as a toy example of unanticipated adverse outcomes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: