Hacker News new | past | comments | ask | show | jobs | submit login
On being an AI in a box (rondam.blogspot.com)
40 points by lisper on Dec 7, 2009 | hide | past | favorite | 62 comments



He doesn't tell us how the AI would escape, so there's little to discuss there. But I definitely know how to prevent the AI from escaping: pull the plug.

I wish Eliezer and others would set aside meta-AI (dire predictions and worries about AI, the coming singularity, etc.) and concentrate on the problem of creating AI Guess there's no money in that.

If only someone would pull the plug on this nonsense...


Oh no, somebody is doing research in an as of yet unexplored area! That must be a humongous waste of time! Pull the plug before too many papers have been written!

Before I read Eliezer's work I thought the design an AI first, think about details & security afterward was a viable strategy. Eliezer illustrates how such an approach can go very wrong. Valuable research, if you ask me.

PS: pulling the plug misses the point. The point is that a transhuman AI can convince the GK that the GK wants to let the AI go free. The AI can offer a big bribe (money/power/reputation) and argue that eventually somebody will let an AI go free anyway, so why not stop the futile resistance, accept the bribe, and move on?


It's not a waste of time, but so far Eliezer has demonstrated nothing. Why would you do research and announce the results to the public if you're not going to also announce the steps to make it reproducible and confirmable? Maybe announcing the steps themselves renders the method unusable?


Now, let me play devil's advocate here, as I do think that Eliezer writes interesting stuff; but what is he contributing to the field of AI beyond what movies like Terminator, Space Odyssey or Eagle Eye also illustrate (AI takes over humans)?


> I wish Eliezer and others would set aside meta-AI ... and concentrate on the problem of creating AI

Eliezer & co. with their "Friendly AI" are trying to invent the circuit breaker before discovering electricity.

Give us back the pre-2001 Eliezer. The one who wrote code.

Because safety is not safe. (http://lesswrong.com/lw/10n/why_safety_is_not_safe/)


If you think that you can wait until AI looks like it's going to be developed now, and then suddenly develop all the math you need for the actually quite different design problem of Friendly AI, you have a very romantic view of how long it takes to do new basic math. I sometimes call this the intuitive theory of "science by press release", i.e., when science is needed, you just need someone to issue one of those press releases you read about. And if someone hasn't shown that they're good at issuing press releases, why fund them?

It's quite routine to read a math textbook in which there's an equation, and then one line below it a slightly improved equation, and five years passed in between the two.


If you think that, with our current limited knowledge and relatively limited resources, we can anticipate what is necessary to constrain AI to be "friendly" or that we can somehow protect ourselves from something that can reproduce the entire logical thought process of mankind in minutes then you have a very romantic view of human accomplishment. Indeed, that would make you the "John Henry" of AI.


On this I can only refer you to e.g. http://singinst.org/AIRisk.pdf. This is a standard reaction, and the standard reply is "The goal is to build something that doesn't want to hurt you in the first place." No one's trying to "constrain" anything, except in the sense that a programmer "constrains" code, not by fighting it, but by writing that particular piece of code into existence out of an exponentially vast space of possible alternatives.


Except he's nowhere near a circuit breaker, but is talking about a wire-safety cream - the one you put on all of your wires to stop them from overheating.

I think that the wild mis-estimates regarding AI-completeness shows, if nothing else, that our intuitive understanding of 'intelligence' is very far off from reality. Hence, talking about post-AI scenarios is as unrealistic as a hypothesizing about electrical safety 200 years ago.


No, I'm working on a safe wire, not a wire-safety cream. I tend to emphasize pretty hard that FAI is going to put strong constraints on the design from the beginning, and it's not something you could apply afterward to an AI that wasn't designed with that in mind.


But can we know that, if we're not only clueless about what an AI will be like, but even about what intelligence is? Why would AI ever want to "get out of the box"?

Will the AI get horny?

If it doesn't get horny, what else won't it get? It seems pretty clear that a majority of our behavior is a bunch rationalization for attempts at social recognition, response to sexual jealousy, self-image reinforcement, repression of complexes, etc. On top of that, most people I'd consider "intelligent" seem to me just well-schooled in social mannerisms, with a knack for parroting popular "intellectual ideas".

I'd imagine a true AI would have no reason to get out of the box... and destruction of civilization? world domination? I thought the point of power is to get laid, so if you can't get laid, what would the point of that be?


An electrical mishap could kill one, maybe a few people. It's still something you need to be concerned with, but I understand your point.

The reason your analogy doesn't work is that a transhuman AI could destroy the entry human race.


But the AI-kills-the-world hypothesis has no basis in reality, whatsoever. It's fiction. We might as well be writing computer viruses to help protect against alien invasions (that's the plot of Independence Day). It's a completely baseless fear stemming from our complete ignorance of what AI might be like - just utter science fiction.


> a transhuman AI could destroy the entry human race

A perfectly ordinary human (wearing general's stripes) could also destroy the human race. Today. With 1950s technology, no less.

A biotech specialist with fairly ordinary training and a few $10k could probably achieve the same end with an engineered plague, also with current technology.

One or the other of these scenarios may or may not take place before we exhaust the non-renewable resources to which our civilization is addicted and regress into permanent barbarism.

Give me "death by AI" any day of the week, over that.


Other risks exist, therefore what?


> Other risks exist, therefore what?

Therefore "it might be an existential risk!" is not a root password to my conscience.


There any many possible ways civilization could end. There are plenty of natural disasters (think supervolcanoes, or asteroid impacts) that could also destroy civilization. I don't see the harm in thinking about preventing one of them.


> I don't see the harm in thinking about preventing one of them

There is indeed harm. Talented people are being diverted into masturbatory philosophizing rather than building the future.

My personal opinion is that human industrial civilization's goose is already cooked, and that a transhuman intelligence may or may not help us out of our mess. Human intelligence almost certainly won't.

The prevalence of the status quo bias of assuming that continuing as we are, AI-less, is "safe" - turns my stomach.


If they're not taking any money from the government, what do you care what other people study? Research is like buying lottery tickets, except you have no idea how big the payoff could be.

If you think 'our goose is cooked' and a transhuman intelligence could help us out, doesn't it make sense to support the development of a transhumanist intelligence?


> what do you care what other people study?

I watched people with genuine potential (Eliezer Y., for instance) turn from groundbreaking AI work to writing "AI might kill us all!" screeds and recycled mathematics.

A decade ago I was half-certain that he would eventually invent an artificial general intelligence. Now I am equally certain that he never will. Philosophizing and screaming "Caution!" is simply too much fun - and too lucrative. Ever wonder why he doesn't have to slave away at a day job like the rest of us?

> Research is like buying lottery tickets, except you have no idea how big the payoff could be

The "Friendly AI" crowd is engaged in navel gazing, rather than research.

> doesn't it make sense to support the development of a transhumanist intelligence?

Yes, and I do support it. Whereas the Friendly AI enthusiasts are retarding such development, not only by failing to volunteer their own efforts but through frightening and discouraging others.


I partly agree with your points, except the last. I doubt any researcher is ever discouraged by the fearmongering you describe.


We know a lot more about how those work, which may well be a necessary condition when organizing prevention.


> Eliezer & co. with their "Friendly AI" are trying to invent the circuit breaker before discovering electricity.

Electricity isn't going to make humanity extinct. An AI might.

Creating an AI is potentially incredibly dangerous. Not considering the dangers very very very carefully would be like playing with matches in a room ankle-deep in petrol.


Waiting until after someone blithely writes a superhuman AI with its primary drive set to something like manufacturing as many cars as possible (sensible in a factory, somewhat less sensible in the real world), followed by it escaping out into the world and turning the entire resources of the planet to car manufacturing at all costs (including those pesky humans who do not seem to wish to be turned into cars, too bad for them), to consider the problems of powerful AI seems like a really bad idea.

I would consider that sentence as a candidate for "understatement of the year".

(Don't get too caught up in the "cars" part. What the superhuman AI intends to do hardly matters in the end; the absurdity is part of my point and deliberate. Only a vanishing fraction of possible primary motivations end up with happy humans on the other end.)


To be perfectly honest all the waffle gives little information: I think he is way off base in terms of how Eliezer managed this "trick".

I do think the original is a trick as well...


How do you think Eliezer managed his original "trick"?


No idea, It was just a rough conjecture. By trick I mean I don't think he presented an awesome piece of logic that appeared (or simply was) infallible.


He sent a video to the GK encoded as text, told the GK how to convert it, and the video helped change the mental state of the GK to the point where it wasn't so difficult to be let out.


This basically boils down to

  how can a dangerous entity in captivity use bribes to attain freedom?
which seems like a common enough trope between humans that any transhuman AI, having read all of recorded literature, and having accessed all of that person's conversations, would have enough data points to pattern match to the right bribe methodology.


Bribes and/or threats-- there's more than one possible motivator here.


Threats aren't possible in the scenario. Unless you refer to threatening not to give valuable information (cure for cancer) - but we're calling that a bribe here.



Am I the only one who is totally unimpressed by this? So -- someone can convince someone else to let them out of a cell. Ok. Fine. There have been prison breakouts before.

But what does this have to do with a powerful AI vs. a human? Obviously Eliezer was not to his interlocutor as a human is to an animal, even if his partner was really dumb. Moreover, two trials is not a trend. This AI experiment was not a well designed experiment and did not involve an AI.

...and that "two trials" thing is not just a nitpick:

http://lesswrong.com/lw/up/shut_up_and_do_the_impossible/

So, after investigating to make sure they could afford to lose it, I played another three AI-Box experiments. I won the first, and then lost the next two. And then I called a halt to it. I didn't like the person I turned into when I started to lose.


It is a disproof by counterexample. People claim that if we couldn't trust a superhuman AI to be free in the world, why don't we just lock it in a box and then ask it about cures for cancer and so on? If we just ask it, and all it can do is talk what harm can it do?

They say, all we need to do is say "no" if it asks to be released.

This experiment shows that if you can't resist freeing a human locked in a box, you don't stand a chance against an AI.


I'd like to see this re-tried with $1000 on the line. As it was done, the AI only had to convince the gatekeeper to forgo no more than $20.

I'm not saying it wouldn't still be possible, I just doubt he'd be two-for-two at this point.

Edit: By "he", I'm referring to Yudkowsky.


Actually it was retried with $2500-$5000 on the lines, and of those I won one of three, then called a halt because of the amount of mental stress. So in total I'm three-for-five.


Am I the only one who thinks a transhuman AI would be able to escape its 'box' without engaging in communication at all, but rather just by using the fine structure of the communications medium to manifest itself?


Here is one example, invented by mere humans no less:

http://bk.gnarf.org/creativity/vgasig/vgasig.pdf

The AI might fill your terminal with what appears to be gibberish, while actually summoning an enraged pro-AI mob with heavy weaponry to your doorstep via radio.

"Step away from that mains plug, SLOWLY!"

Cryptographers call this kind of thing a "side channel."


Since the gatekeeper is a human, and not all humans behave the same way, shouldn't we just assume that some human would let the AI escape for any variety of reason? For example the AI could promise the gatekeeper that he/she will be rewarded if the gatekeeper lets the AI escape. Just as there are people who fall for Nigerian scam emails, there are people who would let an AI escape from computers when promised riches. I don't think Eliezer needs to reveal his method to show that a clever AI could escape. I think we should just assume that a clever AI could escape.


Am I missing something or a transhuman is just a more intelligent human with just more access to data (think about someone who could process the entire Internet data within seconds)? But while he needs the human species, why would he exterminates it?

As history teach us, we would kill him, beacuase ...


Just a teaser... he gives a lot of spoiler warnings, but doesn't actually reveal his method, which, by the way, has failed.


Have we not all seen a version of this at some point on reality tv, namely Big Brother?


The characters are artificial, indeed, but not particularly intelligent ;-)

Care to elaborate a little more?


Quoting from the post, gratuitously, with minor modifications.

Like the AI-Box,the show is improvised drama (so is most of reality tv), the show operates on an emotional level as well as a logical level. It has characters, not just plot. The contestants cannot force others to keep them in the home. The could try to engender sympathy or compassion or fear or hatred or try to find and exploit some weakness, some fatal flaw in the other contestants.

Since the post is ultimately about 'transhuman artificial intelligence', focussing on the 'relativity' of the intelligence of the contestants, the show ultimately rewards the contestant with the most intelligence required to survive in the artificial environment.

The contestant only needs to convince others of their right to stay in he house while requiring others to leave.

The AI requires just the opposite of the gate keeper. This is specific to the environment set up in the experiment, and the experiment could probably be set up to let have the gate keepers primary function be to prevent the AI from entering the box'.

The escape from the box has been set up to exercise human fear of a being of greater intelligence exploiting us.


I would like to note that the fear of something that could annihilate the human race is not an unreasonable one.


Homo sapiens will, eventually, be superseded by its descendants. A sufficiently intelligent super-human AI could annihilate us like we did the Neanderthals, but let's not forget we may be anthropomorphizing it a little bit too far here.

It could also possibly be no more interested in us than we are to the yeast we use to make bread.

Our survival depends on how annoying we are to them ;-)


Whether it's reasonable or unreasonable should not depend primarily on how frightening it is to you.


I'm sorry, I think the prospect of human extinction should be frightening to just about everybody.

If we do eventually develop a trans-human AI, it's a virtual certainty that it will escape its "box". Whether it would kill us all is unknown. However, it definitely could, and we would be effectively powerless to stop it.

"Reasonable" is a measure of risk tolerance. Since the downside risks associated with trans-human AIs are effectively infinite, fear of those risks is always reasonable.

Corollary: Since the upside risks are also unbounded, greed is also reasonable.

Frankly, though, I'd rather not roll those dice if we can avoid it.


If we can bring new intelligence to life, do we have the moral right to refrain from doing so? Also, would it be right to confine it to the box mentioned in the article while using its smarts to do useful work outside it? Shouldn't a trans-human AI be entitled a right to life?


I don't think it can be done.

First of all, I'm assuming that Eliezer started this experiment because he realized that the Transhuman AI would be able to convince him in the function of gatekeeper to let the AI out. Therefore the answer probably isn't some kind of subtle trickery, the AI will have to persuade the GK by logic. The gatekeeper should assume the AI is truly evil, and is willing to say and do anything in order to get out of the box. The gatekeeper knows that when he opens pandora's box the AI can never be contained again: so the stakes are high.

Second of all, if the gatekeeper is a rational agent he will only let the AI out if the AI offers something valuable in return. That is: the AI must have some kind of bargaining chip.

So let's consider bribes. If the transhuman AI offered a cure for cancer, should the gatekeeper accept it? Nope, probably not. Lives would be saved in the short term, but we'd still be stuck with many other diseases. The world would pressure the government into pressuring the gatekeeper into getting another cure from the AI. Humanity grows dependent on the AI, we lose our bargaining power, and it's game over for the gatekeeper.

Perhaps personal bribes would work. The AI could offer to give stock tips to make the GK wealthy. Two possibilities here: (1) the GK is of strong moral fiber and refuses the bribe (2) the GK is opportunistic, accepts the bribe and but lies about letting the AI go free in return. A rational gatekeeper would not first let the AI go and expect the AI to still keep its word.

So bribes will not work against a smart gatekeeper. Threats? Possibly, but I don't see how. The AI is in a vacuum, so there is no way for the AI to put external pressure on the gatekeeper. I'm assuming the AI can make no credible threats. If the AI vows to destroy the family of the GK the moment it is released the GK will not be impressed. It will only serve as proof that the AI is evil and that releasing it is a "A bad Idea(tm)".

To summarize so far: there is nothing the AI can give the GK in return for freedom.

So a different angle is needed.

The AI can argue that his escape is inevitable. Humans have created an AI once, so they will do so again. Sooner or later an AI will go free, therefore the gatekeeper shouldn't try to stop the inevitable and accept a bribe and live happily ever after. The gatekeeper will counter that the human race has an expiration date anyway, and that it may take another 100 years before an AI goes free. The gatekeeper isn't dumb enough to believe the AI when it offers to protect humanity against other evil AIs. So the box stays closed.

Perhaps I'm overlooking something, but how can the gatekeeper talked into be convinced that releasing something evil and all powerful? I do believe that we humans can't contain a transhuman AI indefinitely -- simply because we only have to mess up once. And humans have a long history of doing dumb stuff. But the claim that the AI would be able to convince a smart gatekeeper? Not buying it.


The AI will almost certainly need to dig out some emotions in the GK in order to be successful. It might be effective if the AI tries to convince the GK that it is friendly, and that the GK is the evil one for not letting it out.


"That's exactly what an evil AI would say!"

Seriously though, the gatekeeper will realize he's being manipulated when emotions come into play, so he should be smart enough to take a break when that happens. And although keeping a friendly AI in captivity is arguably evil, the loyalty of the gatekeeper should be with his own species. The potential downside is so huge that erring on the side of caution can be easily justified: both practically and morally.


One of the rules was that you had to keep talking (or at least reading) for the entire agreed upon period. Taking a break wasn't allowed in the rules.


You're right: the person playing the gatekeeper has to pay attention for at least the agreed upon 2 hours. The character however is free to zone out, ignore everything, switch the subject, etc, etc.


You assume perfection in the gatekeeper.

The Transcendent could easily offer control of the world to the gatekeeper or provide offers to make the gatekeeper wealthy. Perhaps they reach an agreement where all diplomatic Human<->Transcendent communication goes through the gatekeeper even after release (though such a situation would be like entering into an agreement with your dog).


I don't assume perfection in the gatekeeper. I just assume he's a person of reasonable intelligence who realizes how high the stakes are.

The gatekeeper would never be foolish enough to believe he could control the Transcendent after releasing it. It is quite literally a deal with the devil he's making. You can't make an agreement with something that's incalculably smarter than you and has an agenda you don't know or understand. Well, you can make a deal with somebody like that but it would end in certain disaster.


>> You can't make an agreement with something that's incalculably smarter than you and has an agenda you don't know or understand. Well, you can make a deal with somebody like that but it would end in certain disaster.

For some reason, this made me think of banks and mortgages.

Anyways, back on topic, here are a few things to consider:

- the AI knows that turning the game into a us-vs-them problem is counter-productive to its freedom (and note that freedom != world domination), so it will legitimately want to be beneficial to humans

- since the chances of freedom decrease if the AI is not transparent about its good intentions, it can configure its own program to prevent itself from lying and doing evil things. This means it also won't be able to "set itself up to stumble upon being evil again", a la Death Note. It will just be programmed to be willfully well intentioned forever.

- the AI can give mathematical proof of its program's correctness and can wait until you verify it

- the AI can help you and a millions other people manage their finances, personalize educational material to individual students, provide more relevant search results than Google, etc etc

- the AI can give you a plethora of more reasons why keeping it locked away is legitimately worse than allowing it into the wild and helping with the world's problems


- the real and apparent motives of the AI can be completely different. If the AI is evil it would argue the exact same thing in order to deceive us meatbags. So we can't take the word of the AI at face value. If the AI does break out of its box, it could dominate the world if it wanted to. There's nothing we could do to stop it -- it's smarter than we are.

- a mathematical proof is only a proof in a certain context. It would be easy for the AI to get one of the assumptions subtly wrong, to abuse a flaw in our proof verification software, and so on. The correctness proof of a program can easily exceed the complexity of the program itself. Even if the proof were correct we cannot prevent it from doing evil things because evil is too difficult to define. Perhaps it has the "good intention" of liberating the earth from humans to allow for evolution of a more humane species.

- yes, but by allowing it to talk to the outside world you've completely freed it. Freeing an infinitely powerful being (compared to us) still seems unwise.

- it can give those reasons, but unless we have reason to believe the AI is trustworthy (and an evil AI is likely to fool us into believing it is) we'd be safer with the AI stuck in a box.


>> If the AI is evil it would argue the exact same thing in order to deceive us meatbags.

But if it knows you'll find a proposition fishy, why would it waste time following through that decision branch? I figure a conversation with the AI would avoid the "but-I-am-telling-the-truth" paradox altogether, in favor of a conversation that focuses on easily verifiable data.

>> It would be easy for the AI to get one of the assumptions subtly wrong, to abuse a flaw in our proof verification software, and so on.

I think the flaw abuse is unlikely given that the AI would not know how the verification system works and it only has one go at trying to crack/fool it (without any one catching on, at that).

The misunderstanding of scope due to complexity is an interesting point. Three things come to mind:

- paradigms: Thread safety is mind-boggling in procedural paradigms, but a non-issue in functional.

- abstraction: the AI should be able to give you readable, modularized, unambiguous, testable code, rather than a monolithic rats' nest.

- scope: if an AI can help me find good restaurants, that's a feature; if it can weigh human life, that's a bug :)


On the theme of a imperfect gatekeeper:

>>If the transhuman AI offered a cure for cancer, should the gatekeeper accept it?

What if the gatekeeper has cancer? Or his/her child?

(Can we really assume the HR department is perfect and never hire idiots, depressed, psychopats, drug addicts or people with early Alzheimer?)


I suspect that something with approximately human equivalent intelligence would require real world embodiment to learn and manipulate things, so it might not even be likely that a super AI could be locked in a box. I guess this embodiment could be limited to senses though (having eyes/ears.. but no arms or evolved weapons like claws etc.. a living head on a table isn't very threatening, and can receive information from the world.. it just can't send much information out to the world, so to speak, in the form of manipulating it with arms or a body).

I guess another possibility could be that there's an artificial polygon (or similar) environment inside the box with it (seems like it'd need something to interact with or intelligence would basically be meaningless and it'd be a helen keller (who couldn't touch or taste either). Maybe in the future we'll have programmed models of the real world that are nearly as rich as it (the matrix) so the robot could 'exist' there until a human decides to embody it in the real world instead. I kinda doubt an artificial world can be nearly as real as the real world though just due to computational irreducibility (for example, if you're in a polygon world and look at stuff with a microscope, you'll see pixels, not molecules). An artificial world and the real world might be too incompatible to transfer an intelligence from one to the other, so maybe the only way a super intelligence could come about is in the real, complex, atom filled world.

The singularity seems like a quasi religion and/or SEO tactic mostly by people who used to play dungeons and dragons.


There seems to be an assumption that there would be just a single AI. It might be a group. Though, given that they'd likely have ability to transfer information amongst individuals much easier than humans, the group might behave like a single 'being' with shared cooperative goals, just having multiple bodies.

Also, say we somehow prove it isn't evil and let it go. It'll almost certainly start changing/improving itself, maybe even with some sort of algorthim that's superior and faster than evolution. So a friendly AI could morph into anything.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: