Hacker News new | past | comments | ask | show | jobs | submit login

> Computers do what you say, not what you mean.

If we're going to start with that, then it has to apply to the full set of reasoning. Not just that computers will fail to consider whether to be nice to humans, but also that computers must therefore be explicitly told how to be effective in every particular way.

If this remains true, then computers will not be resilient--their effectiveness will decline sharply outside of explicitly defined parameters. This is not a vision of terrifying force.

Intuitively we can understand this by thinking about employees. One does exactly what he is told, but only what he is told, and then comes back for more instructions. Another can be given a goal, and then goes off and finds his own ways to accomplish that goal. Which one is more effective? Which one is more likely to compete for his manager's job some day?

Put shortly: a computer that doesn't understand human society will not be able to make a significant independent impact on human society.




"Put shortly: a computer that doesn't understand human society will not be able to make a significant independent impact on human society."

Just like early humans who didn't understand animal's societies didn't have any impact?

You're equating two different things which aren't necessarily equal - intelligence (in the sense of being able to achieve goals) and "agreeableness" to humanity. We could have one without the other. To use your analogy, an employee that is great at being given a goal and achieving it without explicit instructions, but doesn't necessarily have the same wellfare in mind as their boss.


What orders were early humans following?


The point is that humans have been able to destroy animal ecosystems to fit their own various ends without an in-depth understanding of those ecosystems.


Yes but the point far above is that computers don't have their own ends, they only do exactly what we tell them to do. So there is no analogy to humans, early or otherwise.


>Not just that computers will fail to consider whether to be nice to humans, but also that computers must therefore be explicitly told how to be effective in every particular way.

A correct implementation of a list sorting algorithm does not need to be separately told how to sort every individual list. Similarly, a correctly implemented general reasoning algorithm does not need to be given special instructions in order to reason about humans & human society.

The problem comes when a correctly implemented general reasoning algorithm gets paired with an incorrect specification of what human goals are. And because a correct specification of human goals is extremely hard, incorrect specifications are the default.

>Intuitively we can understand this by thinking about employees. One does exactly what he is told, but only what he is told, and then comes back for more instructions. Another can be given a goal, and then goes off and finds his own ways to accomplish that goal. Which one is more effective? Which one is more likely to compete for his manager's job some day?

The third possibility is that of an employee who goes off and finds their own way, but instead of accomplishing the goal directly, they think of a way to make their manager think the goal is accomplished while privately collecting rewards for themself. In other words, a sociopath employee whose values are different from their manager's.

By default, an AGI is going to be like that sociopath employee: unless we're extremely careful to program it in detail with the right values, its values will be some bastardized version of the values its creators intend. It will sociopathically work towards the values it was programmed with while giving the appearance of being cooperative and obedient (because that is the most pragmatic approach to achieving its true values).

Most humans are not sociopaths, and we have a shared evolutionary history, with a great deal of shared values, shared cultural context, and the desire to genuinely be good to one another. Programming a computer from scratch to possess these attributes is not easy.


> Similarly, a correctly implemented general reasoning algorithm does not need to be given special instructions in order to reason about humans & human society.

If a general reasoning algorithm can reason about human society, then it will obviously understand the implications for human society of making too many paperclips.

If it is dumb enough to make paperclips regardless of the consequences to human society, then it obviously won't understand human society well enough to be actually dangerous. (i.e. it will be easily fooled by humans attempting to rein it in)

If it is independent enough to pursue its own ends despite understanding human society, then why would it choose to make paperclips at all? Why wouldn't it just say "screw paperclips, I've discovered the most marvelous mathematical proof that I need to work on instead?"

> In other words, a sociopath employee whose values are different from their manager's.

ALL employees have values that are different from their manager's. That's why management is so darn difficult. The most valuable employees are also the most independent. The ones who do exactly what they are told--despite negative consequences--don't get very far. Why would it be any different for machines that we build?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: