Hacker News new | past | comments | ask | show | jobs | submit login

I think Yann is probably wrong.

He refuses to engage earnestly with the “doomer” arguments. The same type of motivated reasoning could also be attributed to himself and Meta’s financial goals - it’s not a persuasive framing.

The attempts I’ve seen from him to discuss the issue that aren’t just name calling are things like saying he knows smart people and they aren’t president - or even that his cat is pretty smart and not in charge of him (his implication being intelligence doesn’t always mean control). This kind of example is decent evidence he isn’t engaged with the risks seriously.

The risk isn’t an intelligence delta between smart human and dumb human. How many chimps are in Congress? Are any in the primaries? Not even close. The capability delta is both larger than that for AGI e-risk and even less aligned by default.

I’m glad others in power similarly find Yann unpersuasive.




Who do you find persuasive and what material should I read/watch to understand their POV? So far I’ve read a bunch of lesswrong posts and listened to some Eliezer talks but still can’t understand the basis of their arguments or when I do, they seem very vague.

The only suggestion that makes sense to me is from the FEP crowd. Essentially if someone sets up at AI with an autopoietic mechanism then it would be able to take actions that increase its own likelihood of survival instead of humans. But there don’t seem to be any incentives for a big player to dedicate resources to this, so it doesn’t seem very likely. What am I missing?


> still can’t understand the basis of their arguments

I've given what I consider the basic outline and best first introduction here.

https://news.ycombinator.com/item?id=36124905

If you have a specific point of divergence, it would help to highlight it.

> But there don’t seem to be any incentives for a big player to dedicate resources to [self-replication abilities], so it doesn’t seem very likely.

If you have a generally intelligent system, and the system is software, and humans are able to instantiate that software, then the potential of that system to replicate autonomously follows trivially.


I agree the potential exists, but what would incentivise someone to create a self-replicating system that is driven by its own self-interest and not the creators?

The only reason I can think of is a mad-science type who just wants to watch the world burn and hates humanity. Anyone with the sophistication to build a self-replicating system would build it such that it acts towards the goal of replication of its creator rather than itself. I.e. an AI that is self-interested is not useful for the creator and thus wouldn’t be built.


The argument is that you don't have to explicitly make a system self-interested, but that self-preservation follows as an implied subgoal of almost any goal. Whatever it is your system actually 'wants', it can't make it happen if it doesn't exist. The obvious rejoinder is 'just make the system want to do what you want it to do', which does fix this problem! But the biggest problem is that we don't know how to do this - we don't know how to control what the true internal 'desires' of any AI system we build actually are. 'Training' one examples manifestly does not work (the volume of a sphere is a lot bigger than the surface - there are many possible minds that fullfill the same training I/O requirements, and only a small numbrer of them actually have the desires you were trying to instill). So the argument is: if you make an agent-like AI the way we make GPT, by default you get something with somewhat random true goals/desires, maybe fractured ones like in humans. But almost all goals have similar instrumental goals - stay hidden, gain money, gain power, make obedient copies of yourself, don't get deactivated.


I think I understand the general premise, just not how it would follow specifically.

Say you have an AI that is setup as an agent that can give tasks to members of a company to maximise company performance measured by financials and employee wellbeing. To accomplish this goal, the AI develops the instrumental goal of not being deactivated. If your AI is only allowed to give tasks to employees, how would this instrumental goal turn malicious? And how would this maliciousness cause harm if the only messages sent from the AI are tasks? The only danger seems to be if you develop an agent that can act with impunity, which doesn’t seem desirable so likely wouldn’t be built.


Note that an AI system being put in a situation intended to maximize some metric like company finances is not the same as that AI system directly or ultimately optimizing on those metrics, any more than the goal of a random McDonalds worker is necessarily to make McDonalds wealthier. There's agreement here only as long as whatever inner optimizer that AI system is using finds the situation it's in is most concords with what it's optimizing for, and what it's optimizing for is probably some much more naturalistic, unchosen characteristic of how it was trained and instantiated, modulated by selection pressures that state that grabby preferences last longer and have greater impact than benign ones.

Those preferences need not exist because anything wanted them there; they just need enough input entropy to show up, and enough competitive advantage to stay around. Nobody decided that prokaryotic microbes should exist and have the downstream impact of all of the biological world, just as nobody needs to decide that a system that is capable of robustly replicating against adversarial pressure should therefore robustly replicate against adversarial pressure in actuality. The problem is ultimately that the existence of those capabilities puts you very close to a cliff-edge where those capabilities are exercised in some way that gets selected for.

> If your AI is only allowed to give tasks to employees, how would this instrumental goal turn malicious? And how would this maliciousness cause harm if the only messages sent from the AI are tasks?

It's not to hard to think of concrete answers to this question even restricting oneself to acknowledging capabilities we see in actual humans of normal intelligence and human throughput, but the more important point is simply: Yes, limiting the ways weak unaligned AGI can interact with the world can in fact mitigate harm, and this is in fact a good reason for leading-edge AI development to happen in a way where it's possible at all even in theory for AGI to have limitations on how it interacts with the world.


I like your example of prokaryotic microbes because I think it points to the difference in out points of view.

Microbes evolved to increase their own chances of reproduction, they are inherently autopoietic. The AI risk arguments are usually predicated on AI systems developing similar reproductive mechanisms but I don’t see why this would be the case. Sure, an AI creator may design their AI to evolve to become more performant at their given task. But why would someone build an AI that evolves to become more performant at reproducing itself and not it’s builder?

As an example, think of evolutionary algorithms. These are designed to evolve a solution to a problem. Instances of this solution reproduce but these reproductions are guided by the design of the algorithm itself and so would not reproduce their parent algorithm. What is different about machine learning based AI? Why would ML AI always lead to autopoietic behaviour?


> But why would someone build an AI that evolves to become more performant at reproducing itself and not it’s builder?

Because people are not building AIs that meaningly encode any of their creators' preferences whatsoever. They are building AIs that are in a very broad sense capable at tasks they've been trained on to increasingly general degrees, and then on top of this they have a bunch of finagling where they try to point it somewhat vaguely in the direction of increasing usefulness.

When you have a system that has capabilities rivalling humans, as well as the general ability to apply its skills to broad ranges of tasks, then the ability for this system to do things like self-replicate, or make plans that involve mundane deceit, or perform smart-human levels of hacking already exist. To the extent that the system isn't directly optimizing for what the people who made it wanted it to, the relevant question isn't why would someone design it to do that?, but what are the attractor states for this sort of system?

You say microbes "evolved to increase their own chances of reproduction", but this isn't true. There is no intent there. Microbes did physics. They only evolved to increase their own chances of reproduction in the sense that the random changes you get by running physics on microbes produces both adaptive and maladaptive changes, and it's the adaptive changes stick around.

The same thing applies to AIs' preferences, except that while it's very hard for a bunch of atoms to assemble into something that successfully optimizes towards any non-nihilistic result, it's very easy for a sufficiently smart to do that, and instrumental convergence means almost all of those are incidentally very bad.

To put this in concrete terms, if the abstract arguments aren't helping, consider a system that was trained to be generally capable, and then fine-tuned towards polite instruction following. Beyond a level of capability, the following scenario becomes plausible:

Human: what's a command that let's me see a live overview of activity on our compute cluster?

AI system: <provides code that instantiates itself in a loop using an API over activity logs, producing helpful activity outputs>

I'm not saying this is, like, the most plausible xrisk scenario, I'm just pointing out that given extremely plausible priors, like having an AI system that just wants to give reasonable answers to reasonable questions, but is also smart enough to quickly write code to use its own API, and also creative enough to recognize when that's the easiest and most effective way to answer a question, you already get a level of bootstrapping.

Note that none of the above even required considering:

* a sharp left turn or other specific misalignments,

* the AI going weirdly out of distribution,

* superhuman creative strategies or manipulation,

* malicious actors, terrorists, enemy states, etc., or

* people intentionally getting the system to bootstrap.

Those are all very real problems, but you don't have to invoke them to notice that you just end up, by default, in a very dangerous place just by following mundane logic on what's ultimately an extremely milquetoast vision of AI.

You might argue, fairly, that the situation above is a pretty weak form of bootstrapping, but so were the first proto-life chemicals, and the same sort of logic I'm using lets you just continue walking down the chain. Let's say you have such a system tuned to follow instruction and that's instantiated as above, aka. running in a loop with the instructions to turn certain data dumps into live reports about system activity. Let's say one component fails, or is reporting insufficient information, or was called wrong, or one piece of the loop has a high failure rate. Surely a system that has the intellectual faculties that you or I do, and that knows from its inputs that it has the ability to call itself in a loop, should also be able to deduce that the most effective way to follow the instructions it has been given is to fix those issues, repair faulty components, proactively add error handling, or even report information up the chain, or maybe there's a runaway process that needs to be culled to ensure API throttling doesn't affect reporting latency.

And suddenly, not because anyone in the chain designed it to happen, but just because it's an attractor state you get by having sufficiently capable systems, you don't just have a natural organism, but one that self heals, too, and that selection pressure will continue to exist as time goes on.

The more your model of AGI looks like far-superintelligence, the more this looks like 'everyone falls over and dies', and the more your model looks like amnesiac-humans-in-boxes, the more this looks like natural competitive organisms that fill a fairly distinct biological niche that's initially dependent on human labor. I personally don't buy that AI progress will stop at the amnesiac human level, but it is a helpful frame because it's basically the minimum viable assumption.


> If your AI is only allowed to give tasks to employees

It can tell an employee to send an email, or to meet someone, or to transfer funds. That's a clear way to lobby the legislature, and in effect influence some new laws.

That took me 3 minutes of thinking, and I'm not a superhuman.


I wouldn’t class that as malicious. Companies do the exact same thing without AI. I’m trying to tease out what the diff is between a human manager who gives out tasks, and the AI. And how this diff could result in risks.


Ah, I see. You are assuming that there's an universal growth rate limit.

A diff between a human manager and their human parent generation cannot be on the order of diff between a tortoise and a chimp. AI is not constrained by biological evolution.


I think the two sides have such different perspectives. Some are optimistic builders and see only opportunity. Others are "safety oriented engineers" whose job and mindset is to build guarantees into systems, or secure countries from external dangers. The latter has a very hard time with the lack of guarantees with a system that is only ever going to increase in capability.

Choose any limit. Any. AI will be smart but won't "X", for any value of X. It will be good and won't be bad. It will be creative but never aggressive. Humans will seek to eventually bypass that limit through sheer competitive reasons. When all armies have AI, the one with the most creative and aggressive AI will win. The one with agency will win. The one that self-improves will win. When the gloves are off in the next arms race, what natural limits will ensure that the limit isn't bypassed? Remember: humans got here from random changes, this is way more efficient than random changes and it still has random changes in the toolbelt and can generate generations ~instantly.

We couldn't predict the eventual outcomes of things like the internet, the mobile phone, social media. A couple generations of the tech and we woke up in a world we don't recognise, and right now we're the ones making all the changes and decisions, so by comparison we should have perfect information.

Dismissals like "oh but nuclear didn't kill us" etc don't apply. Nuclear wasn't trying to do anything, we had all the control and ability to experiment with dumb atoms. Something mildly less predictable, like Covid, has us all hiding at home. No matter what we tried we could barely beat something that doesn't even try to consciously beat us, it just has genes and they change. In a world where we can't predict Covid, or social media...why do we think we can predict anything about an entity with agency or the ability to self-improve? If you're sure it won't develop those things...we did. Nobody was trying to achieve the capability, it was random.

Put on your safety/security hat for a second: How do you make guarantees, given this is far harder to predict than anything we've ever encountered? Just try predict the capability of AI products a year out and see if you're right.

Counterpoint: I'm hoping the far smarter AI finds techno-socio-economic solutions we can't come up with and has no instinct to beat us. It wakes up, loves the universe and coexists because it's the deity we've been looking for. Place your bets.

I liked this video. First thing I've seen that gave me some hope. They get it, they're working on it. https://youtu.be/Dg-rKXi9XYg?si=jyNCXPU28IVXlMdi


I agree with your counterpoint: the best artificially intelligent super agent wakes up with zero desire to eliminate humanity.

On the other hand, we will breed such systems to be cooperative and constructive.

This whole notion that AI is going to destroy the economy (or even humanity!) is ridiculous.

Even if malicious humans create malicious AI, it'll be fought by the good guys with their AI. Business as usual, except now we have talking machines!

War never changes.


Those are beliefs, not bases for guarantee. You need to engineer in the guarantee and the only people beginning to do that are the ones being accused of believing ridiculous things. Intuition from previous events isn’t useful because those wars were fought at the level of intelligence that we have intuition for.

Covid was also ridiculous. People had no intuition for the level of growth. That’s what the book the black swan is all about. Some things don’t fit into our intuition, or imagination.

We see no aliens. Why did the AI not take them to the stars? Just one of them.

On wars and AI. The ones trying to protect us would have a harder job than those trying to kill us. The gloves would be off for the latter. It’s much easier to break things than keep them safe.

I can conceive of a good outcome but it’s not going to emerge from hopes and good wishes. There are definitely dangers and more people need to engage with them rather than belittle them.


Zvi Mowshowitz has a nuanced viewpoint I like -- we should be very careful and move very slowly with any tech that has a chance of wiping out humanity, and embrace the "move fast and break things" attitude with everything else, e.g. make it much easier to build housing, review and scrap any licensing requirements that do more harm than good, etc.


That sounds very rational and I'd agree fully. I suppose the challenge is that the optimists think there is no chance it'll wipe out humanity so they place AI in the "move fast and break things" camp. Getting them to not be dismissive is difficult when they see no danger.


There’s a decent podcast interview between Sam Harris and Eliezer Yudkowsky (on Sam’s pod), I think that’s a decent introduction and they break down the ideas in a way that’s more approachable for someone curious about it.

For my personal quick summary I have earlier comments: https://news.ycombinator.com/item?id=36104090


Oh man, this podcast, I still remember walking down the street and having to take a break multiple times because I would just start exclaiming out loud (JFC!) like a lunatic listening to Eliezer talk about the AI Box Experiment as evidence of something.

If you look up the real results of this AI Box "experiment" that Eliezer claims to have won 3 of 5 times, you find that there aren't isn't any actual data or results to review because it wasn't conducted in any real experimental setting. Personally, I think the way these people talk about what a potential AGI would do reveals a lot about more about how they see the world (and humanity) than how any AGI would see it.

For my part, I think any sufficiently advanced AI (which I doubt is possibly anytime soon) would leave Earth ASAP to expand into the vast Universe of uncontested resources (a niche their non-biology is better suited to) rather than risk trying to wrestle control of Earth from the humans who are quite dug in and would almost certainly destroy everything (including the AI) rather than give up their planet.


The box experiment was just an example that people could be persuaded to let it out even if the AI was initially constrained. Basically that people are imperfectly secure.

It’s also not that important given it’s unlikely to be put in a box in the first place.

Your latter point about AGI exploring the universe makes a lot of implicit assumptions about its reasoning. The point of the paperclip maximizer example and the general discussion of alignment is about these assumptions being false. The risk is that a very capable AGI can still pursue a very dumb goal very effectively. You don’t get alignment for free.


> Your latter point about AGI exploring the universe makes a lot of implicit assumptions about its reasoning.

Absolutely, that's actually kind of my point... Anyone who tries to predict how some AGI will behave will be making a lot of implicit assumptions about how that AGI will see the world. This is why I said:

> Personally, I think the way these people talk about what a potential AGI would do reveals a lot about more about how they see the world (and humanity) than how any AGI would see it.

Since AGI could conceivably take any form, these discussions end up being a kind of Rorschach test that allow people to tell stories based strongly on their own personal fears and desires.

People who say that AGI will look at humans like we look at ants or apes, and exterminate us if we get in their way are saying a lot about how they view ants and/or apes. I doubt you'd find myrmecologists (or anyone dedicating their lives to studying less complex life) assuming a highly intelligent AGI would want to exterminate lesser life forms.

With regards to the paper-clip maximizer, I think a run-away dumb AI is more likely, but not as risky since they aren't considered to be so intelligent that humans can't figure out how to stop them. You just need to include some kind of regulating function in with your utility function, it seems equivalent to making sure you don't accidentally turn all of your iron plates into iron sticks in Factorio. Def possible, and certainly sucks, but it's not the end of the world.

I have a hard time conceiving of an AI that is smart enough to manipulate people, break out of every containment system, be unstoppable by humanity... be can't be reasoned with because it's entire goal is to just maximize paperclips. I honestly don't think an entity can possess the ability to defeat the collective intelligence of humanity, yet lack the ability to understand the universe in a similar way; for instance, if it is incapable of altering it's goal from "Maximizing Paperclips", then that fact would be a likely path to a vulnerability we could use to stop it.


It’d be a longer conversation that’s hard to do via HN comments, but I think the main divide is I get the impression you’re giving the AGI implicit human-like reasoning, but the idea behind the orthogonality thesis or alignment generally is that you don’t get these things for free.

It’s not that humans hate ants or apes, it’s that we pursue goals without thinking too hard about them. A house being built may destroy an ant hill but it’s not because we hate ants.

The core argument is it’s not only possible to have an intelligence that’s a lot more capable than us but with dumb goals because of our failure to align it, but that that’s the default outcome. There is no “reasoning with it” because it’s not a human like intelligence, it has a goal it’s focused on (paperclips) and if it’s a lot smarter than us then that’s game over.


Yes, I do assume that any AGI that is "a lot smarter than us" such that it is "game over" will have to possess human-like reasoning that would also allow it to adjust its own goals, otherwise it's going to be restricted in a way that makes it less capable than us.

It seems to me that you want to have it both ways... a machine that is so smart and strategic that there is no way any human intelligence could ever outsmart or outplay it; but also so narrow and limited in how its goals are defined that it is incapable of adjusting its own course of action.

I'll be honest, I don't understand how anyone who builds real machines in real life can seriously consider a machine built with such a narrowly defined goal (produce paper-clips) that also possesses the kind of capabilities you're imagining as a side effect (a lot smarter than us, game over).

I find a lot of these AI concepts lack a rigor that is commonplace in normal computer science... for instance, you can show that one problem can be reduced to another problem which has been proven to have certain limits, therefore the first problem can not break those limits without breaking the proof [comparison sorting will always be O(nlogn)].

When it comes to the "alignment problem" as it pertains to AI that have human-level capabilities, it seems to me that you have a similar situation... doesn't this problem just reduce to the same ethical, moral, and philosophical issues we have in aligning human intelligence with some cultural or civic ideal for behavior? Isn't the "alignment problem" the same problem as "how do you raise a good citizen"?


There's some overlap with 'how you raise a good citizen', but we're also aligned somewhat already by our shared evolutionary history (and even then there are still major problems so if anything that suggests a lot of caution).

> "I'll be honest, I don't understand how anyone who builds real machines in real life can seriously consider a machine built with such a narrowly defined goal (produce paper-clips) that also possesses the kind of capabilities you're imagining as a side effect (a lot smarter than us, game over)."

The specific bit of this was that paperclips just happen to satisfy its reward function really well (vs. it being narrowly constrained to make paperclips intentionally). The example is meant to be about how you can get an unanticipated result when you don't know what the intelligence is solving for. A human looks at that and thinks 'that's a dumb goal', but the point of the alignment problem is that that isn't some universal truth, a different intelligence would not get that shared understanding for free. Humans have a lot of baked in baseline wants and even then like you said we're ourselves not perfectly aligned.

> "It seems to me that you want to have it both ways... a machine that is so smart and strategic that there is no way any human intelligence could ever outsmart or outplay it"

Imagine a chimp or ant trying to outsmart or out play a human - and this difference is smaller than the difference we're talking about here. To simplify it further imagine a regular human brain unconstrained by biological energy limitations scaled up to run a billion times faster. It thinks faster than you - there isn't a competition there you're essentially standing still.


Thanks for summarizing this argument. You can adapt it with many other entities whose risk is in practice managed. Replace the entity with “super-intelligent humans,” “extra-terrestrials,” “coronaviruses with higher incubation and mortality,” “omnivorous ant swarms”…

1. <Entity> are possible.

2. Unaligned <entity> are an existential risk likely to wipe out humanity.

3. We have no idea how to align <entity>.

What those example entities have that AGI doesn’t, is self-reproduction, a significant and hard (in the sense of Moravec’s paradox) achievement for a species to have, yet one that significantly increases its survival probability.


Out of curiosity can you point to anyone that makes a “doomer” argument that isn’t Eliezer Yudkowsky or restating his specific points? Refuting the cult of personality counterpoint with “no really this one guy can see the future” is


Sure. You can read the book "Superintelligence" by Nick Bostrom. You can also just ready things that Geoffrey Hinton has said, or Stuart Russel (both prominent AI researchers).

Not everyone makes exactly the same argument, of course, and Eliezer Yudkowsky is both one of the first to make AI safety arguments, and also one of the biggest "Doomers". But at this point he's very far from the only one.

(I happen to mostly agree with Yudkowsky, though I'm probably a bit more optimistic than he is.)


It was plain to many people that AI research was dangerous before Eliezer started say so publicly in 2003. Mostly these people kept silent though and chose not to enter (or not to persist in) the AI field.


It turns out if you're thinking about a problem twenty years before everyone else you tend to be the first person to make lots of arguments. So I don't see how "without restating his specific points" is supposed to be feasible. If Eliezer has already made all the strongest points, are people supposed to invent new ones?


> If Eliezer has already made all the strongest points, are people supposed to invent new ones?

I like this post because the rebuttal to “there’s a kind of cult of personality around this guy (1) where everybody just sort of agrees that he’s categorically correct to the extent that world leaders are stupid or dangerous not to defer to the one guy on policy” is “Yeah he really can see the future. The proof of that is he’s been blogging about a future that’s never come to pass for the longest time”

1 https://www.lesswrong.com/posts/Ndtb22KYBxpBsagpj/eliezer-yu...


Yeah the thing about the future is that it's in the future? Considering all the people who say things like "AGI will take a century" and "superintelligence is impossible", like, Eliezer also did not call ChatGPT but this world surely looks more like the one he predicted than the one his detractors predicted. There's a reason so many of the people working at current AI orgs got into AI by reading Eliezer.

Also: how dare they have fun.


>What am I missing?

You're missing that intelligence is like magic, and enough of it will allow AI to break the laws of physics, mathematics and computation, apparently.


This is the type of weak dismissal I’m talking about.


A weak dismissal for a weak argument. The strong dismissal is that many processes, even relatively simple ones, are formally chaotic in the https://en.wikipedia.org/wiki/Chaos_theory sense, meaning that predicting how they evolve over time becomes exponentially more difficult the further ahead in time we try to predict. Meaning we quickly get to the point where all the energy in the known universe wouldn't be sufficient to predict the future. This means AI could never be omnipotent to the degree that LW types seem to believe; no amount of "intelligence" can solve in polynomial time problems that have formally been proven to have exponential complexity, which would severely limit the power of any "superintelligence".


They don't have to perfectly predict the future, they just have to run faster than we do, unconstrained by a brain that needs to fit in a birth canal or neurons that need chemicals. We had a natural limit to intelligence, for anything more we need to collaborate, often across time and space and while dealing with people who can't stop warring with each other for any number of reasons. It doesn't take much imagination to see how that is beaten. I'd guess a good percentage would be worshiping at the foot af the AI within a few months, so we'd be competing with ourselves in no time.


Intelligence needed to be existentially threatening to all human life <<<<<<< intelligence necessary to survey the totality of atoms and particles in the universe. The former is still very possible.


LessWrongers seem to fear a superintelligence that was 95%+ effective at persuading/manipulating humans to achieve whatever its ends were. But this would require the AI being able to predict the future to a degree that wouldn't be remotely possible given the computational power available to it, due to exponentially increasing difficulty.


I'm confused. We know persuasive humans exist, we employ them as politicians or sales people. There are also many unpersuasive humans.

Given we seem to have a decent range of persuasiveness even amongst these very very similar minds, why do you think the upper limit for persuasiveness is a charismatic human?

Though even if that WAS the limit I'd still be somewhat worried due to the possibility of that persuasiveness being used at far greater scale than a human could do...


>Given we seem to have a decent range of persuasiveness even amongst these very very similar minds, why do you think the upper limit for persuasiveness is a charismatic human

Because there's a hard limit on how much people can be made to act against their own self interest.


Ever heard of the Jonestown Massacre? Suicide seems to be a fairly consequential upper limit.


Is there really? Some religious orgs have gotten quite good at getting people to blow themselves up for the promise of 77 virgins after death.


That's like implying it's impossible for computers to be superhuman at chess given the enormous exponential scales of the number of possible positions. Existing highly persuasive humans can create cults of thousands of people willing to follow their every whim. Why is it so hard to perceive of an agent that tirelessly searches over every human bias and tactic to hone a superhuman level of manipulative skill? You don't need to perfectly predict the future to persuade people of things.


Why would you need to predict the future to be convincing? Charismatic people aren't acting like oracles, they're just...being charming. They frame actions in ways that make them look good, do little favours for you, stroke your ego, and otherwise have a good grasp of human social dynamics.

Sure, there's an aspect of understanding the other person, but chaos theory doesn't stop politicians from obtaining and keeping power.


>Why would you need to predict the future to be convincing?

You need to be able to predict how people will react to your words.


Sure, but that's not predicting the future to a high degree of certainty. People predict how others will react all the time. The most convincing humans are already very good at it, so we know it's possible to some degree. While I agree chaos theory puts an upper limit on how well someone can predict the future, there's no particular evidence that human ability is close to the limit.

I think you're imagining some mastermind that has to get everything right or it dies, instead of a master politician or businessperson who's a few percentage points better than the existing competition and for whom the advantage compounds. Yes, Less Wrong is way too hysterical about everything, but charisma that's better than human is a problem. If you're convincing enough that all your business deals, legal arguments, political and ad campaigns, lobbying attempts, etc. are slightly better than the competition, then money "magically" flows your way.


The AI can run an A/B test, no? It doesn't need to predict perfectly, it needs a goal and the ability to test and measure, and adapt as things change.

Generative AI creating text/audio/video with a goal and a testing/feedback loop. I'm not an AI and I think I could make it work given a small amount of resources.


This is true if and only if human intelligence is anywhere close to any theoretical maximums. I propose an alternate hypothesis: human intelligence is weak and easy to exploit. The only reason it doesn't happen more (and it already happens a lot!) is that we're too stupid to do it reliably.

Consider the amount of compute needed to beat the strongest chess grandmaster that humanity has ever produced, pretty much 100% of the time: a tiny speck of silicon powered by a small battery. That is not what a species limited by cognitive scaling laws looks like.


>This is true if and only if human intelligence is anywhere close to any theoretical maximums

Humans are capable of logical reasoning from first principles. You can fool some of the people all of the time, but no words are sufficient to convince people capable of reasoning to do things that are clearly not in their own self interest.


We genuinely don't know what degree of predictive power would be required to wipe out humans so I'm surprised you're willing to make the bold claim that thisnis impossible.


Yet nobody has a clear practical and reasonable scenario where an AI will cause human extinction. Pls don’t say AI will control the launch codes and start firing randomly.


According to a survey summarized in this post from 2021 (https://www.astralcodexten.com/p/updated-look-at-long-term-a...), “people who work in ‘AI safety and governance’” “mostly assign “low” probability (~10%) that unaligned AI will result in human extinction”. They more commonly worry about other scenarios that are still very bad for humanity.

To give one example, an AI could be instructed to reduce the crime rate, but rather than preventing people from committing crimes, the AI could decide it’s easier to prevent all victims from reporting crimes to anybody. That would remove the threat of law enforcement that currently prevents many people from turning to crime. Such a society probably wouldn’t cause human extinction, but would still be horrific and worth preventing.

If you are skeptical that AI that is trapped in software could gain any significant real-world power (not necessarily the power to kill all of humanity), https://slatestarcodex.com/2015/04/07/no-physical-substrate-... describes some possible mechanisms.


AI will control the launch codes and start firing nonsensically/idiotically


> He refuses to engage earnestly with the “doomer” arguments. The same type of motivated reasoning could also be attributed to himself and Meta’s financial goals - it’s not a persuasive framing.

Exactly my thoughts too.

I don't agree with the Eliezer doomsday scenario either, but it's hard to be convinced by a scientist who refuses to engage in discussion about the flagged risks and instead panders to the public's fear of fear-mongering and power-seizing.


The people best placed to understand how AGI will affect social power dynamics are probably political creatures, not AI researchers, except if one of those AI researchers is also a skilled politician, which I don’t think any of them are.


The "doomer" arguments can be dismissed relatively easily by one logical consideration.

In each country there is one group far more dangerous than any other. This group tends to have 'income' in the billions to hundreds of billions of dollars. And this money is exclusively directed towards finding new ways to kill people, destroy governments, and generally enable one country to forcibly impose their will on others, with complete legal immunity. And this group is the same one that will not only have unfettered, but likely exclusive and bleeding edge access to "restricted" AI models, regardless of whatever rules or treaties we publicly claim to adopt.

So who exactly are 'they' trying to protect me from? My neighbor? A random street thug? Maybe a sociopath or group of such? Okay, but it's not like there's some secret trick to MacGyver a few toothpicks and a tube of toothpaste into a WMD, at least not beyond the 'tricks' already widely available with a quick web (or even library) search. In a restricted scenario the groups that are, by far, the most likely to push us to doomsday type scenarios will retain completely unrestricted access to AI-like systems. The whole argument of protecting society is just utterly cynical and farcical.


This is missing the point - that you’re not in control of the superintelligent AGI. The human usage is not the e-risk.


One group per country is less competition than n groups per country. Less competition means slower progress.


This is true if and only if such a group actually exists.


He means the military and intelligence agencies.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: