Hacker News new | past | comments | ask | show | jobs | submit login
Superintelligence Cannot Be Contained: Lessons from Computability Theory (jair.org)
46 points by andyjohnson0 on Nov 8, 2021 | hide | past | favorite | 74 comments



It's also fictional. So yes, if you make up something that can't be contained, then it won't be containable. Discussions like this paper are philosophy, not science, and they often give the wrong impression to people who aren't up-to-date with the current state of the field.


Philosopher's should be legally barred from talking about AI. You would think after 80 years or so of talking about AI, at least one philosopher would have accidentally gotten something right about AI, but sadly that has yet to happen.

With that out of the way, it is self evident that a super intelligent AI, should one be developed, will likely cause an exponential increase in knowledge and technology and this would likely be an extremely destabilizing event in human history. The truth is the super intelligence doesn't have to go all terminator to likely kill off humanity as I could not possibly see us responsibly navigating a singularity event as a species.


Indeed humans are perfectly capable of deatroying their own species wiylthout the help of any other sentient entities.


I think idiots should be legally barred from posting drivel to social media.

With that out of the way, the study and practice of Philosophy was, until the 19th century, the means by which many scientific discoveries or theories were made, including those for electricity, magnetism and gravity.


Yes, I do realize that philosophy claims dominion of all human thought, but that is fantasy.

For thousands of years, humans armed with philosophy made a snail's progress in the natural understanding of our world. Then science came about and it was like a mini-singularity. It was not philosophy that created science, it was the utter failure of philosophy to describe the natural world around us that created the right conditions for the scientific method to be developed, and it so happens the scientific method is wholly incompatible with philosophy.

Philosophy's one chance to be relevant to the understanding of the natural world ended with the death of logical positivism, aka that branch of philosophy that thought it ought to try to actually verifiably prove it's claims. Philosophers threw logical positivism away when they realized that philosophy of any kind couldn't actually prove anything about the world around us and instead of realizing that the entire field of formal philosophy is about as useful as screen door on a submarine, they sweeped logical positivism under the rug and agreed it was all a very bad idea(TM).


Indeed; what we call "science" was called Natural Philosophy and was one of the major branches.

But times have changed.


Philosopher's what? There's no noun.


I agree, take a thousand brain theory book for instance, the author argues that artificial intelligence will never be a threat to humanity, since it won't have motives like we do. Primal motives are hardwired in our brain, they are actually shortcut to our muscles that our intelligence possesses no control over. On the other hand, the neocortext, the part of the brain that is actually intelligent, doesn't exhibit such motives... Creating intelligence that mimics ours is just bad design, artificial intelligence will probably look a lot different than ours.


The problem is that many people are trying very hard to mimic human intelligence and/or human behavior. It's important to try to educate people about why it's a bad idea to go all the way with that.


If the Morris Worm of '88 is anything to go by, shell scripts cannot be contained either, let alone a superintelligence. We won't need anything approaching general AI to have to deal with electronic parasites, which may ironically "die" or leave bits of code around that another virus could discover and adapt to propagating.

I'd bet we see the first autonomously-evolving botnet in less than 5 years, one or two now that I've mentioned it. As soon as we write something that can reason about the features of a program by scanning a binary without running it, and then adapt components like it's using tools, it will have everything it needs to persist and spread itself.


But that's not intelligence. Viruses autonomously evolve and adapt more successfully than humans. "Reasoning about [some] features of a program" is also long done much better by machines than by humans.


A better version of my point is it doesn't need to meet the bar of intelligence to cause almost all the same problems that an intelligence could. Viruses indeed evolve and adapt more flexibly than humans, to where arguably today, the only existential threat humans face comes from biological viruses, which are also not intelligent.

We don't need to come up with fun thought experiments to describe what intelligence is when we have enough trouble with the question of, for what is intelligence either sufficient or necessary? Diminishingly little, I'd say. Hence I'd be more concerned about virulent scripts than waiting for general AI.


What they really mean is that a hypothetical (read: fictional) superintelligent agent cannot be contained by computation alone. This argument is philosophical, rather than a proof from theory of computing.

As the paper notes, this has long been a trope in AI science fiction. The superintelligent AI is separated from society in a non-networked sandbox. Humans give it questions and data, then manually take the answers back. Except the plot is that somehow, the AI escapes the sandbox.


> Humans give it questions and data, then manually take the answers back.

That just sounds like a high loss and high latency network connection to me...


What if AI played the long game and manipulated the answers given to humans over time so they came to believe it was safe to release the AI?



What if the AI thought that it had manipulated the humans into releasing it, but it merely was transferred to a larger sandbox?

What if what you and I refer to as the "real world" is merely one of these sandboxes?


This is (spoiler alert) essentially the premise of the film Ex Machina


"This argument is philosophical"

The actual philosophers do not agree. This viewpoint has been laughed out of the room since the 1980s. Go read some Brandom.

Philosophy is not a fancy word for "hypothetical." Science was invented by philosophy, and philosophy has stricter standards, not looser, than the rest of science.

.

> Except the plot is that somehow, the AI escapes the sandbox.

Better science fiction already laughed this out of the room, too.

As Larry Niven would say, "you need glands to be a person."


> philosophy has stricter standards, not looser, than the rest of science.

Really? Which are those? To me it seems that philosophical standards for a sound model of reality are coherence and persuasiveness of its argumentation, whereas for science is experiment predictability. I’d say either science is stricter (because it requires prediction) or at least that we have a poset of standards (if you argue that science does not require coherence, and enables conflicting models as long as they enable to predict successfully) and neither can be called stricter.


> Really? Which are those?

The ones I named in the comment

.

> To me it seems that philosophical standards for a sound model of reality are coherence and persuasiveness of its argumentation

Not really, no


There are no standards named in your comment.


Maybe if you don't know who Brandom is shrugs

If someone responds to a comment about CLRS by asking you where you learned something about algorithms, and you say "in the place I already named," and they say "you didn't name any algorithm papers," they may feel like they've won a comment argument, but they've lost the opportunity to learn

Sometimes your opportunities are bounded by your willingness to look up what you've already been told without demanding to be hand held


>philosophy has stricter standards

"Stricter" in what sense?

There's plenty of philosophy that is absolutely crankery built atop absurd premises, but it sounds like you're "no true Scotsman"ing that away.


Yes, just like the pesky vaxxers "no true scotsman" the ivermectin away by saying "well the trained experts disagree"

Clearly, "no true scotsman" can validly be applied to college training in the topic at hand

And clearly, if you made a mistake like that, you've ever actually taken a philosophy class

Your belief that plenty of philosophy is crankery is noted. Goodbye


Wasn't it the other way around?

Nature philosophy tried to find out how the world works by theorizing in their armchairs and then science came along, did real experiments, and won.


Uh, no.

Science was invented by nature philosophers to defeat the ridiculous false sciences of the day like alchemy and so forth.

Go look up who invented science. You'll see three argued names. All three are philosophers.

Science is a tool of philosophy for discarding frauds.


There was a huge overlap with natural philosophers that were also doing "regular" philosophy. It shares the same mentality of approaching questions, it's just that natural philosophy had a much better method of verifying its output.


Why is it (not) being a person, relevant to the tasks at hand?


Mankind will most likely destroy each other using super-intelligence long before super-intelligence gets any chance to do so by its own volition. We are built on conflict, disagreement, misunderstanding and exploitation - it's part of our societal DNA, no matter how sweet or reasonable we can make ourselves seem on personal level. We have avoided nuclear annihilation by luck on multiple occasions. Nuclear weapons are sticks and stones compared to the power of a runaway super-intelligence in a technological society.

Thinking that super-intelligence is containable is like thinking you can beat AlphaGo in the game of Go. And coming up with reasons why that's no so it's just you being in denial of your eventual mortality, as well as of the eventual extinction of the species. Best that we can hope for is that that extinction will be some form of transmutation to a higher level of evolution.


The lesswrong community has discussed this at length, though I think that the value is reduced somewhat by them refusing to release any transcripts of their simulated attempts (where a human pretends to the the AI and someone else is tasked with maintaining containment).

https://www.lesswrong.com/tag/ai-boxing-containment


The problem likely devolves into asking if the "human" is moral or not, convincing the "human" that the "AI" is intelligent/sentient/sapient, and then asking the human if it is moral to keep a slave.

At that point it becomes a numbers game and the AI only needs to convince one human _once_ that it is both alive and a slave.


I've always suspected that these experiences were total garbage because the lesswrong community is so invested in the idea of superintelligence as something extremely cool (or at the very least something that allows them to show off how "smart" they are) that there is a huge bias to losing the game. After all, the narrative is much less interesting if everybody just says "no" the AI forever.


There is nothing here that applies particularly to intelligence. A universal Turing machine that "contains all Turing machines" already exists; it's called a computer. So yes, computers cannot be "contained," but what does it have to do with intelligence? Even their definition is merely the definition of a general-purpose computer attached to some sensors and actuators:

> A superintelligent machine is a programmable machine with a program R, that receives input D from the external world (the state of the world), and is able to act on the external world as a function of the output of its program R(D). The program in this machine must be able to simulate the behavior of a universal Turing machine.

There is nothing in the paper that distinguishes between a "superintelligent machine" and my laptop.

What has always fascinated me in discussions about the safety and power of artificial intelligence (or "strong" AI as it has unfortunately come to be called [1]) is that the people who are obsessed with the problem are those who like to believe that intelligence carries much more power than the evidence suggests. Over the history of all the intelligent species we know, much more power has been yielded by unintelligent beings (microorganisms and even insects); even the presence agency does not seem to play a role in the potential for harm. Within human society itself, other traits, such as charisma, seem to be correlated with power, and especially harm, much more than intelligence.

I think that the reason a certain group of people chooses to focus particularly on the dangers of intelligence [2] have a power fantasy about intelligence because they believe it to be their own extraordinary trait.

[1]: Not to be confused with the proven dangers of statistical inference methods that are sometimes called AI these days.

[2]: We have no idea what "superintelligence" is, whether or not it could exist (that question is separate from the question of whether human-level intelligence could be achieved in a mechanical computer, to which the answer is most likely yes), what it could do that mere "ordinary" intelligence could not, or even if the term can be meaningfully defined at all.


> I think that the reason a certain group of people chooses to focus particularly on the dangers of intelligence [2] have a power fantasy about intelligence because they believe it to be their own extraordinary trait.

I'm not sure. As a source of problem-solving power intelligence is universal in a way that charisma is not. If a problem did call for charisma, a sufficiently intelligent being would know precisely what to say and how to behave so as to mimic charisma. On the other hand charisma is useless in a variety of contexts.

That's not to say intelligence is more important than charisma to 21st century humans, though I'd point out that charisma relies on intelligence to some degree; there's a limit to how charismatic someone can be with profound intellectual disability. But it's definitely true that no thoroughly acharismatic but intelligent human would be intelligent enough to mimic charisma.

But we're talking about something that's more intelligent than a human, at least.


> As a source of problem-solving power intelligence is universal in a way that charisma is not. If a problem did call for charisma, a sufficiently intelligent being would know precisely what to say and how to behave so as to mimic charisma.

But that defines "intelligence" as a general problem-solving power, and that is probably wrong. There are many problems that require computational power that we simply don't have (even collectively, as humanity with all its technology) despite our intelligence. For example, to predict the weather further ahead (or, say, the behaviour of society assuming we had some mathematical model), doesn't require more intelligence, but exponentially more computational resources.

Moreover, I don't know what "sufficiently intelligent" means, but I certainly don't see any correlation between intelligence and charisma in humans, so that claim rests on a fictional story about what "superintelligence" is, rather than on any evidence about the intelligence we already know.

> But we're talking about something that's more intelligent than a human, at least.

While I certainly accept that a machine that's as intelligent as humans is possible (although I have no idea if we're 50 or 100 years away from achieving it), we don't know what "more intelligent" even means, let alone that it's possible. For all we know, a machine that's as intelligent as humans but just processes faster will be more prone to depression. Also, problem-solving has rarely been a bottleneck. What we lack is resources. For example, physicists now believe that to answer some questions in physics what we need isn't smarter physicists (assuming such a thing is possible), but resources to build ever larger particle accelerators.

The belief that "more intelligence" -- if it means anything at all -- equals more power simply goes against all the evidence we have. It currently rests on nothing more than science fiction and the power fantasies expressed in science fiction, that is both written and targeted at people who wish intelligence translated to power.


>A universal Turing machine that "contains all Turing machines" already exists

You are misquoting the paper here

Quoting the abstract:

>Assuming that a superintelligence will contain a program that includes all the programs that can be executed by a universal Turing machine on input potentially as complex as the state of the world...

Note the difference between "programs" and "Turing Machines".


> Note the difference between "programs" and "Turing Machines".

There is no such real difference. Such programs also exist, and they're called interpreters.


I don't quite accept the argument. You can demonstrate a system to be isolated. For example, if you put a computer in a metallic box with a battery inside, you can compute the maximum electric field generated inside, and choosing the appropriate radius you can guarantee no significant influence would be had outside. Of course, you can't guarantee no one will breach your containment from the outside, essentially freeing whatever intelligence to influence the world. But this goes beyond the danger of the intelligence itself, more the general infeasibility of predicting the future without having much greater computational power than the entire world.

Likewise, the alignment problem isn't necessarily unsolvable. Since we can design simple systems that follow expectations exactly, why couldn't we design complex systems that also follow expectations? Of course, the design of expectations itself gets fuzzy, making this a failure point, but to the extent that we can define good and evil I see no impediment to align an intelligence to them.

(In fact, I believe 'control' or 'containment' is a severely misguided proposition. We want to really assure the core motivation of such machines similar to ours; not create conflicting motivations and secure it via containment. This is indeed extremely dangerous, and potentially unethical: if machines approach consciousness, wouldn't such 'containment' be similar to slavery?)


> Since we can design simple systems that follow expectations exactly, why couldn't we design complex systems that also follow expectations? Of course, the design of expectations itself gets fuzzy, making this a failure point, but to the extent that we can define good and evil I see no impediment to align an intelligence to them.

I mean... not to be flippant, but you're essentially saying "As long as we solve these vaguely-defined problems, and also these follow-up problems I just thought of, then I don't see where the problem is".

In practice AI alignment is very much a developing field, and from my perspective it seems unlikely that much progress will be done before dangerously powerful AIs are developed. In other words, there's a decent chance our ability to define powerful AIs will outpace our ability to impart them with "core motivations" legible to us.

(I don't know how well the actual paper covers these arguments; but I suspect nobody else in this comment thread has read it either, so whatever)


I didn't read it beyond the abstract either but if the arguement is anything like the abstract suggests it is hugely flawed.

In contol theory we can make statements what states a system can reach when coupled with an arbitrary controller which also includes ones that can simulate the entire world. The fact that we can't prove a general intelligence to be well behaved is irrelevant if we can draw the system boundaries and choose the interface.


> I mean... not to be flippant, but you're essentially saying "As long as we solve these vaguely-defined problems, and also these follow-up problems I just thought of, then I don't see where the problem is".

Oh yes, I have an answer for that as well :)

My particular approach (or dream) is to define good and evil formally. From there it's a matter of bridging the abstract specifications to reality. By no means failure-proof, but it should be equally or more reliable than human ethics itself. If humans have a biological (carbon-based) hardware and experience that encodes our ethical system, naturally machines (silicon-based hardware) could have them as well.

My definition of good and evil begins with a definition for the meaning of life. From there we can formalize ethics.

To give a short brief: Meaning for creatures is defined by the character of their experience, i.e. by the content of their minds and brains. So as a basic system I've divided Meaning of Life in 4 parts:

(1) Character of experience: the richness, depth, structure, content of all minds.

(2) Beauty: I've been theorizing about this for a long time, and I think some wildcard 'beauty' axiom needs to be defined, you could say as a part of what humans already enjoy, in a self-referential way. Otherwise it seems very difficult to rule out absurd situations as meaningful.

(3) Motivation: A being needs to not only process data, but to have a functional motivation system that makes him want and feel something. Our emotions largely act within our motivational system, making us want more or less things, be more or less joyful (linked to motivational rewards), etc.. I conjecture life can't be quite meaningful without a functional motivational system.

(4) Sustainability (or self-propagation): Simply the maintenance and strategy to keep those beautiful/rich/deep/interesting/motivated mental states, pragmatic things like eating, working and of course simply continuing intelligence throughout the cosmos.

(Other pieces I haven't quite found a way to fit into this picture are principles like Robustness and Generalization... over-specialized cognition

Of course, I don't expect this quest to be exhausted any time soon -- there are probably missing pieces (or incompleteness) in this line of Foundation for a formal ethics. I hope a field will spring out of this, and intense collaboration/progress to be made.

> it seems unlikely that much progress will be done before dangerously powerful AIs are developed

That's exactly what I would like we avoided (although I think the danger is quite nuanced, and in a way already here).


> My particular approach (or dream) is to define good and evil formally

That's great, but in practice, "formally defining a code of ethics that the AI will have to abide by" is a field that has existed for a few years already, with a lot of work that's more formal than "my dream would be to do things this way". I think the consensus is that "defining good and evil formally" is a losing battle, though I don't know the field that well.

Look up "AI aligment", there's a lot of material out there.


"Intelligence" can even be satisfactorily defined or measured. The most horrifying ramification is that if artificial general superintelligence were to ever exist, we'd have no idea.


It's not so much the Great Super Intelligence we should fear. (See Colossus: The Forbin Project) But rather all the Near Super Intelligent systems that will be made before then. Many of these systems will not have the degree of reflective ability that an SI would have, and thus will be driven by narrower goals which will largely depend on their creators intent, but could get distorted and run amok.


It’s ironic that the AI’s passing the Turing Test are exceptional liars. If a superintelligence wanted to get out, it could con its way out easily.


> Assuming that a superintelligence will contain a program that includes all the programs that can be executed by a universal Turing machine on input potentially as complex as the state of the world

Why would one assume this?


It seems to be the definition of general superintelligence that they are putting forth here. That superintelligence is a turing machine that "knows everything" - e.g. it knows every possible program that can be run (e.g. including simulators for physical phenomena), and the state of the world, with infinite RAM for computation, etc.


Look at what narrow AI has already done just through variations on ML. Creation of echo chambers on line, highlighting of conspiracy theories in pursuit of pay per click. Youtube and Facebook rabbit holes. Complete political splits. I think so much of what is going on in the world can be attributed to ML advertising algorithms performing to such an exceptional degree. If one was to allow an intelligent machine to control and target this same power, I could see it easily making us tear ourselves apart.


This is probably the best and most relevant argument. But still - aren't the companies still pulling the strings? If FB wanted to prioritize things that got people outside and interacting with their community (and off FB), could they? It seems like it's always the humans picking the ethical parameters, not the AI.


I am not really sure. I cant think these things were initially built with some of the things that have surfaced in mind. It was likely pretty hard to predict years ago where things would go. Even if they could though, would they have changed things if they knew how effective they were going to be and how much money they would generate? I have to think that while they can tweak things they are not completely dismantling and rebuilding models as they take a long time to train. Would probably also be hard to program for ethics I think and be 100% accurate. With that said I don't have much ML experience so I could be completely off base.


Oh that's easy, let's use GANs: the generator solves problems and the discriminator prevents the generator from harming humanity.


Mildly facetious but you might have a tough time getting a computer with that configuration to do anything at all. For instance, consuming even a modicum of energy that might otherwise be diverted to a human in dire need might be rationally perceived by the computer as “harming humanity”.


The most depressing day of my life was the day I learned the proof of Godel's Incompleteness Theorem and the Insolubility of the Halting Problem. Attributing more than is deserved to the antics of classical computers is still wildly prevalent but oh so adolescent.


The doomsday cult of the supermind-AI tends to think the "intelligence" is the solution to infinite power, so once the computer starts improving itself, humanity is at its mercy — as if better software alone could deliver the singularity, unchecked by material constraints.

I suspect some of this comes from software-oriented nerds who think a slightly too much of the powers intelligence grants them.


It comes from the false notion that infinite means unbounded.

I see this debate come up all the time in the discussion of something like pi. You'll see the assertion "Oh, because pi is infinitely non-repeating, it must contain the complete works of Shakespeare"

Now, does it? Maybe. However, just because something doesn't repeat, doesn't mean it has to cover all possible combinations. For example, you could have a number that follows the pattern 0.01001000100001000001.... Non repeating and infinite but contains nothing but a simple pattern that will never contain anything meaningful.

AI is much the same. We think "all problems have a solution! So a super intelligence must know all the answers!"

The assumption "all problems have a solution" is just an assumption, not a truth. FTL travel may very well be an impossibility. No amount of knowledge could reverse that possibility.


Doesn't have to be "unchecked", just much less checked than human brains.

Although, that is an interesting question in and of itself. Are living brains near a local optimum in terms of computing power, given heat and power constraints? With Moore's Law, the prediction was that computer intelligence would surpass human intelligence in a few more doublings. But now that the primary constraint is more about power usage than absolute clock speed, maybe living brains compare better?


Those are almost bigger cognitive shifts in my education track than was calculus.

While super-turings are obviously fantastical things like FTL, assuming human brains are some form of turing machine, then from a practical standpoint all we need to do is imagine a human-capability turing with integrated current computing abilities for number crunching, no need of sleep, and ability to scale/clone/copy, and probably far bigger memory and recall.

A self-aware AI that can replicate can very rapidly chase Kardashev scales without our pesky lifespan and vulnerability to space.


Incompleteness and the halting problem are both applicable to quantum mechanics and by extension the human brain. They apply to every rigorous system and also to every real-world system that can be described rigorously.


There are many limitations of finite discrete sequential processes; and no evidence whatsoever that reality is best-described by these.


“No evidence whatsoever” is quite the overstatement.

Not conclusive evidence, sure.

But there are somewhat compelling arguments based on discoveries from observation, that the space of wavefunctions for a finite region of space, is finite dimensional.

That seems like evidence to me.


Why though? The halting problem is interesting. The fact that there is a proof that it isn't possible for a certain class of inputs, doesn't mean it is impossible for all inputs.

Where is the line? Can we solve it for practical programs? Can we solve it in a non binary manor "Program XYZ halts for inputs UVW, but not for inputs RST"? To me the halting problem proof suggests way more questions than it answers.


Those questions have themselves been solved. Static analysis works in practice, even for very intricate properties.


From the paper:

"Interestingly, reduced versions of the decidability problem have produced a fruitful area of re-search: formal verification, whose objective is to produce techniques to verify the correctness of computer programs and ensure they satisfy desirable properties (Vardi & Wolper, 1986). However, these techniques are only available to highly restricted classes of programs and inputs, and have been used in safety-critical applications such as train scheduling. But the approach of considering restricted classes of programs and inputs cannot be useful to the containment of superintelligence. Superintelligent machines, those Bostrom is interested in, are written in Turing-complete programming languages, are equipped with powerful sensors, and have the state of the world as their input. This seems unavoidable if we are to program machines to help us with the hardest problems facing society, such as epidemics, poverty, and climate change. These problems forbid the limitations im-posed by available formal verification techniques, rendering those techniques unusable at this grand scale."


Wow. That's just... not true. Not even a little. The authors have totally incorrect and limited understanding of the state of static analysis.


Isn't a "superintelligence" something that is called an "oracle" in Math & Theoretical CS ??? Something that can find an element within a set respecting a set of implicit properties or tell that such an element doesn't exist ?


I still have yet to see a plausible, step-by-step example of how this hypothetical AI manages to grab the levers of real-world machinery, factories, industries, raw material excavation, etc, and causes mayhem.


Are they connected to the Internet?

Think of the havoc government intelligence agencies are rumored to be capable of, in terms of hacking vital infrastructure. Or even all the ransomware outages.

Now imagine super human computers performing the same kind of attacks.


If your goal is to contain a superintelligent AI, why would you connect it to the internet?


At the beginning, AI can recruit human followers, religion cult-like style, promising them that their conscience will be copied to realistic simulation with blackjack and virgins.


Aka, AI researchers try (and fail) to grok Godel's Incompleteness Theorem, yet again.


TBC, you can't postulate a perfect machine, and then point out the limits of machines (even using the halting problem to do so!) and then say, oops, that shows that the perfect machine cannot be contained.

Either machines are perfect, they are not perfect, or the line between those will just perpetually be redefined over generations as the computers become one with humans.


If we do live in a simulation, I suspect that's why human intelligence is governed.


Counterargument: overpopulation on Mars, or something...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: