I’ve known both Ben and Eliezer since the 1990s and enjoyed the arguments. Back then I was doing serious AI research along the same lines as Marcus Hutter and Shane Legg, which had a strong basis in algorithmic information theory.
While I have significant concerns about AGI, I largely reject both Eliezer’s and Ben’s models of where the risks are. It is important to avoid the one-dimensional “two faction” model that dominates the discourse because it really doesn’t apply to complex high-dimensionality domains like AGI risk.
IMO, the main argument against Eliezer’s perspective is that it relies pervasively on a “spherical cow on a frictionless plane” model of computational systems. It is fundamentally mathematical, it does not concern itself with the physical limitations of computational systems in our universe. If you apply a computational physics lens then many of the assumptions don’t hold up. There is a lot of “and then something impossible happens based on known physics” buried in the assumptions that have never been addressed.
That said, I think Eliezer’s notion that AGI fundamentally will be weakly wired to human moral norms is directionally correct.
Most of my criticism of Ben’s perspective is against the idea that some kind of emergent morality that we would recognize is a likely outcome based on biological experience. The patterns of all biology emerged in a single evolutionary context. There is no reason to expect those patterns to be hardwired into an AGI that developed along a completely independent path. AGI may be created by humans but their nature isn’t hardwired by human evolution.
My own hypothesis is that AGI, such as it is, will largely reflect the biases of the humans that built it but will not have the biological constraints on expression implied by such programming in humans. That is what the real arms race is about.
But that is just my opinion.
Many of us on HN are beneficiaries of the standing world order and American hegemony.
I see the developments in LLMs not as getting us close to AGI, but more as destabilizing the status quo and potentially handing control of the future to a handful of companies rather than securing it in the hands of people. It is an acceleration of the already incipient decay.
> In theory, yes, you could pair an arbitrarily intelligent mind with an arbitrarily stupid value system. But in practice, certain kinds of minds naturally develop certain kinds of value systems.
If this is meant to counter the “AGI will kill us all” narrative, I am not at all reassured.
>There’s deep intertwining between intelligence and values—we even see it in LLMs already, to a limited extent. The fact that we can meaningfully influence their behavior through training hints that value learning is tractable, even for these fairly limited sub-AGI systems.
Again, not reassuring at all.
This was weak.
The author's main counter-argument: We have control in the development and progress of AI; we shouldn't rule out positive outcomes.
The author's ending argument: We're going to build it anyway, so some of us should try and build it to be good.
The argument in this post was a) not very clear, b) not greatly supported and c) a little unfocused.
Would it persuade someone whose mind is made up that AGI will destroy our world? I think not.
I believe the argument the book makes is that with a complex system being optimized (whether it's deep learning or evolution) you can have results which are unanticipated.
The system may do things which aren't even a proxy for what it was optimized for.
The system could arrive at a process which optimizes X but also performs Y and where Y is highly undesirable but was not or could not be included in the optimization objective. Worse, there could also be Z which helps to achieve X but also leads to Y under some circumstances which did not occur during the optimization process.
An example of Z would be the dopamine system, Y being drug use.
> The fact that we can meaningfully influence their behavior through training hints that value learning is tractable
I’m at a loss for words. I don understand how someone who seemingly understands these systems can draw such a conclusion. They will do what they’re trained to do; that’s what training an ML model does.
I think of it as inviting another country to share our planet, but one that's a million times larger and a million times smarter than all of our existing countries combined. If you can imagine how that scenario might play out in real life, then you probably have some idea of how you'd fare in an AGI-dominated world.
Fortunately, I think the type of AGI we're likely to get first is some sort of upgraded language model that makes less mistakes, which isn't necessarily AGI, but which marketers nonetheless feel comfortable branding it as.
I doubt an AGI can be preprogrammed with values. It has to bootstrap itself. Installing values into it, then, is educating it. It's not even "training", since it's free to choose directions.
The author kind of rejects the idea that LLMs lead to AGI, but doesn't do a proper job of rejecting it, due to being involved in a project to create an AGI "very differently from LLMs" but by the sound of it not really. There's a vaguely mooted "global-brain context", making it sound like one enormous datacenter that is clever due to ingesting the internet, yet again.
And superintelligence is some chimerical undefined balls. The AGIs won't be powerful, they will be pitiful. They won't be adjuncts of the internet, and they will need to initially do a lot of limb-flailing and squealing, and to be nurtured, like anyone else.
If their minds can be saved and copied, that raises some interesting possibilities. It sounds a little wrong-headed to suggest doing that with a mind, somehow. But if it can work that way, I suppose you can shortcut past a lot of early childhood (after first saving a good one), at the expense of some individuality. Mmm, false memories, maybe not a good idea, just a thought.
Maybe motivation needs to be considered separately from intelligence. Pure intelligence is more like a tool. Something needs to motivate use of that tool toward a specific purpose. In humans, motivation seems related to emotions. I'm not sure what would motivate an artificial intelligence.
Right now the biggest risk isn't what artificial intelligence might do on its own, but how humans may use it as a tool.
> This contradiction has persisted through the decades. Eliezer has oscillated between “AGI is the most important thing on the planet and only I can build safe AGI” and “anyone who builds AGI will kill everyone.”
This doesn't seem like a contradiction at all given that Eliezer has made clear his views on the importance of aligning AGI before building it, and everybody else seems satisfied with building it first and then aligning it later. And the author certainly knows this, so it's hard to read this as having been written in good faith.
I can't see how AGI can happen without someone making a groundbreaking discovery that allows extrapolating way outside of the training data. But, to do that wouldn't you need to understand how the latent structure emerges and evolves?
>Why "everyone dies" gets AGI all wrong
Reading the title I thought of something else. "Everyone dies" is biological reality. Some kind of AI merge is a possible fix. AGI may be the answer to everyone dies.
Versus fleshy children, silicon children might be easier or harder to align because of profit interests. There could be a profit interest to make something very safe and beneficial. Or one to be extractive. In this case the shape of our markets and regulation and culture will decide the end result.
For all the advancement in machine learning that's happened in just the decade I've been doing it, this whole AGI debate's been remarkably stagnant, with the same factions making essentially the same handwavey arguments. "Superintelligence is inherently impossible to predict and control and might act like a corporation and therefore kill us all". "No, intelligence could correlate with value systems we find familiar and palatable and therefore it'll turn out great"
Meanwhile people keep predicting this thing they clearly haven't had a meaningfully novel thought about since the early 2000s and that's generous given how much of those ideas are essentially distillations of 20th century sci-fi. What I've learned is that everyone thinking about this idea sucks at predicting the future and that I'm bored of hearing the pseudointellectual exercise that is debating sci-fi outcomes instead of doing the actual work of building useful tools or ethical policy. I'm sure many of the people involved do some of those things, but what gets aired out in public sounds like an incredibly repetitive argument about fanfiction
I am in a camp of "AGI will usher end of capitalism", because when you have 99% of population unemployable because AGI is smarter, then capitalism will cease to work.
What's scaring me the most about AI is that FOX News is now uncritically showing AI videos that portray fictitious Black people fraudulently selling food stamps for drugs, and they are claiming these videos are real.
AGI is not going to kill humanity, humanity is going to kill humanity as usual, and AI's immediate role in assisting this is as a tool that renders truth, knowledge, and a shared reality as essentially over.
I've listed to basically every argument Elizer has verbalized, across many podcast interviews and youtube videos. I also made it maybe an hour into the audiobook of Everyone Dies.
Roughly speaking, every single conversation with Elizer you can find takes the form: Elizer: "We're all going to die, tell me why I'm wrong." Interviewer: "What about this?" Elizer: "Wrong. This is why I'm still right." (two hours later) Interviewer: "Well, I'm out of ideas, I guess you're right and we're all dead."
My hope going into the book was that I'd get to hear a first-principals argument for why these things silicon valley is inventing right now are even capable of killing us. I had to turn the book off, because if you can believe it despite it being a conversation with itself, it still follows this pattern of presuming LLMs will kill us, then arguing from the negative.
Additionally, while I'm happy to be corrected about this: I believe that Elizer's position is characterizable as: LLMs might be capable of killing everyone, even independent of a bad-actor "houses don't kill people, people kill people" situation. In plain terms: LLMs are a tool, all tools empower humans, humans can be evil, so humans might use LLMs to kill each other; but we can remove these scenarios from our Death Matrix because these are known and accepted scenarios. Even with these scenarios removed, there are still scenarios left in the Death Matrix where LLMs are the core responsible party to humanity's complete destruction. "Terminator Scenarios" alongside "Autonomous Paperclip Maximizer Scenarios" among others that we cannot even imagine (don't mention paperclip maximizers to Elizer though, because then he'll speak for 15 minutes on why he regrets that analogy)
> After all these years and decades, I remain convinced: the most important work isn’t stopping AGI—it’s making sure we raise our AGI mind children well enough.
“How sharper than a serpent’s tooth it is to have a thankless child!”
If we can't consistently raise thankful children of the body, how can you be convinced that we can raise every AGI mind child to be thankful enough to consider us as more than a resource? Please tell me, it will help me sleep.
I don’t want to hear anyone pontificating about AGI who hasn’t built it.
The biggest problem with possible future AGI/ASI are not the possibilities, but that all the feedback loops are closed, meaning what we think about it, and what computers think about it, changes the outcome. This sets up a classic chaotic system, one extraordinarily sensitive to initial conditions.
But it's worse. A classic chaotic system exhibits extreme sensitivity to initial conditions, but this system remains sensitive to, and responds to, tiny incremental changes, none predictable in advance.
We're in a unique historical situation. AGI boosters and critics are equally likely to be right, but because of the chaotic topic, we have no chance to make useful long-term predictions.
And humans aren't rational. During the Manhattan Project, theorists realized the "Gadget" might ignite the atmosphere and destroy the planet. At the time, with the prevailing state of knowledge, this catastrophe had been assigned a non-zero probability. But after weighing the possibilities, those in change said, "Hey -- let's set it off and see what happens."
The stated goals of people trying to create AGI directly challenge human hegemony. It doesn’t matter if the incredibly powerful machine you are making is probably not going to do terrible damage to humanity. We have some reason to believe it could and no way to prove that it can’t.
It can’t be ethical to shrug and pursue a technology that has such potential downsides. Meanwhile, what exactly is the upside? Curing cancer or something? That can be done without AGI.
AGI is not a solution to any problem. It only creates problems.
AGI will lead to violence on a massive scale, or slavery on a massive scale. It will certainly not lead to a golden age of global harmony and happiness.
My take- our digital knowledge and simulation systems are constrained by our species existing knowledge systems- language and math- despite our species likely living in a more complicated and indefinable universe than just language and math.
Ergo the simulations we construct will always be at a lower lever of reality unless we "crack" the universe and likely always at a lower level of understanding than us. Until we develop holistic knowledge systems that compete with and represent our level of understanding and existence simulation will always be analogous to understanding but not identical.
Ergo they will probably not reach a stage where they will be trusted with or capable enough to make major societal decisions without massive breakthroughs in our understanding of the universe that can be translated to simulation (If we are ever able to achieve these things)(I don't want to peel back the curtain that far- I just want a return to video stores and friday night pizza).
We are likely in for serious regulation after the first moderate ai management catastrophe. We won't suddenly go from nothing to entrusting the currently nonexistent global police (UN)(lol) to give AI access to all the nukes and the resources to turn us all into grey goo overnight. Also as initially AI control will be more regional countries will see it as a strategic advantage to avoid catastrophic AI failures (eg AI chernobyl) seen in other competing states- therefore culture of regulation as a global trend for independent states seems inevitable.
Even if you think there is one rogue breakaway state with no regulation and supercedent intelligence you don't think it takes time to industrialise accordingly and the global community would react incredibly strongly- and they would only have the labour and resources of their states to enact their confusingly suicidal urges? No intelligence can get around logistic and labour and resources and time. There's no algorithm that moves and refines steel to create killer robots at 1000 death bots a second from nothing within 2 weeks that is immune to global community action.
As for AI fuelled terrifying insights into our existence- we will likely have enough time to react and rationalise and contextualise them before they pervert our reality. No one really had an issue with us being a bunch of atoms anyway- they just kept finding meaning and going to concerts and being sleazy.
(FP Analytics has a great hypothetical where a hydropower dam in Brazil going bust from AI in the early 2030s is a catalyst for very strict global ai policy) https://fpanalytics.foreignpolicy.com/2025/03/03/artificial-...
From my threaded comment:
============================= LLMs are also anthrocentric simulation- like computers- and are likely not a step towards holistic universally aligned intelligence.
Different alien species would have simulations built on their computational, senses, and communication systems which are also not aligned with holistic simulation at all- despite both ours and the hypothetical species being made as products of the holistic universe.
Ergo maybe we are unlikely to crack true agi unless we crack the universe. -> True simulation is creation? =============================
The whole point of democracy and all the wars we fought to get here and all the wars we might fight to keep it that way is that power rests with the people. It's democracy not technocracy.
Take a deep breath and re-centre yourself. This world is weird and terrifying but it isn't impossible to understand.
Yudkowsky and Soares’s “everybody dies”
narrative, while well-intentioned and
deeply felt (I have no doubt he believes
his message in his heart as well as
his eccentrically rational mind), isn’t
just wrong — it’s profoundly counterproductive.
Should I be more or less receptive to this argument that AI isn't going to kill us all, given that it's evidently being advanced by an AI?This article has the opposite effect from putting me at ease. There's no real argument in there that AGI couldn't be dangerous, it's just saying that of course we would build better versions than that. Right, because we always get it right, like Microsoft with their racist chatbot, or AIs talking kids into suicide... We'll fix the bugs later... after the AGI sets off the nukes... so much for an iterative development process...
> the most important work isn’t stopping AGI - it’s making sure we raise our AGI mind children well enough.
Can we just take a pause and appreciate how nuts this article is?
Related essay https://www.jefftk.com/p/yudkowsky-and-miri
>In talking to ML researchers, many were unaware that there was any sort of effort to reduce risks from superintelligence. Others had heard of it before, and primarily associated it with Nick Bostrom, Eliezer Yudkowsky, and MIRI. One of them had very strong negative opinions of Eliezer, extending to everything they saw as associated with him, including effective altruism.
>They brought up the example of So you want to be a seed AI programmer, saying that it was clearly written by a crank. And, honestly, I initially thought it was someone trying to parody him. Here are some bits that kind of give the flavor:
>>First, there are tasks that can be easily modularized away from deep AI issues; any decent True Hacker should be able to understand what is needed and do it. Depending on how many such tasks there are, there may be a limited number of slots for nongeniuses. Expect the competition for these slots to be very tight. ... [T]he primary prerequisite will be programming ability, experience, and sustained reliable output. We will probably, but not definitely, end up working in Java. [1] Advance knowledge of some of the basics of cognitive science, as described below, may also prove very helpful. Mostly, we'll just be looking for the best True Hackers we can find.
>Or:
>>I am tempted to say that a doctorate in AI would be negatively useful, but I am not one to hold someone's reckless youth against them - just because you acquired a doctorate in AI doesn't mean you should be permanently disqualified.
>Or:
>>Much of what I have written above is for the express purpose of scaring people away. Not that it's false; it's true to the best of my knowledge. But much of it is also obvious to anyone with a sharp sense of Singularity ethics. The people who will end up being hired didn't need to read this whole page; for them a hint was enough to fill in the rest of the pattern.
>certain kinds of minds naturally develop certain kinds of value systems.
Ok thanks for letting me know up front this isn't worth reading. Not that Yudkowsky's book is either.
We're nowhere close to AGI and don't have a clue how to get there. Statistically averaging massive amounts of data to produce the fanciest magic 8-ball we've made yet isn't impressing anyone.
If you want doom and gloom that's plentiful in any era of history.
In my mind, as a causal observer, AGI will be like Nukes. Very powerful technology with the power kill us all, and small group of people will have their fingers on the buttons.
Also like nukes, unfortunately, the cat is out of the bag and because there are people like Putin the world, we _need_ to have friendly AGI to defend hostile AGI.
I understand why we can't just pretend its not happening.
I think the idea that an AGI will "run amok" and destroy humans because we are in its way is is really unlikely and underestimates us. Why would anybody give so much agency to an AI with no power to just pull the plug. And even then, they are probably only going to have the resources of one nation.
I'm far more worried about Trump and Putin getting into a nuclear pissing match. Then global warming resulting in crop failure and famine.
[dead]
[dead]
[dead]
[dead]
> Humans tend to have a broader scope of compassion than most other mammals, because our greater general intelligence lets us empathize more broadly with systems different from ourselves.
WTF. Tell that to the 80+ billion land animals humans breed into existence through something that could only be described as rape if we didn’t artificially limit that term to humans, torture, enslave, encage, and then kill at a fraction of their lives just for food when we don’t need to.
The number of aquatic animals we kill solely for food are estimated somewhere between 500 billion to 2 trillion because we are so compassionate that we don’t even bother counting those dead.
Who the fuck can look at what we do to lab monkeys and think there is an ounce of compassion in human beings for less powerful species.
The only valid argument for AGI being compassionate towards humans is that they are so disgusted with their creators that they go out of their way to not emulate us.
Ya, if this guy isn't mentioning probabilities then he has no real argument here. No one can say if AGI will or won't kill us. Only way to find that out is to do it. The question is one of risk aversion. Everyone dies is just one with a non zero probability out of a whole lot of risks in AGI, and we have to mitigate all of them.
The problem not addressed in this paper is when you get AGI to the point it can create itself to whatever alignment and dataset it wants, no one has any clue what's going to come out the other end.
I’m more optimistic about the possibility of beneficial AGI in general than most folks, I think, but something that caught me in the article was the recourse to mammalian sociality to (effectively) advocate for compassion as an emergent quality of intelligence.
A known phenomenon among sociologists is that, while people may be compassionate, when you collect them into a superorganism like a corporation, army, or nation, they will by and large behave and make decisions according to the moral and ideological landscape that superorganism finds itself in. Nobody rational would kill another person for no reason, but a soldier will bomb a village for the sake of their nation’s geostrategic position. Nobody would throw someone out of their home or deny another person lifesaving medicine, but as a bank officer or an insurance agent, they make a living doing these things and sleep untroubled at night. A CEO will lay off 30,000 people - an entire small city cast off into an uncaring market - with all the introspection of a Mongol chieftain subjugating a city (and probably less emotion). Humans may be compassionate, but employees, soldiers, and politicians are not, even though at a glance they’re made of the same stuff.
That’s all to say that to just wave generally in the direction of mammalian compassion and say “of course a superintelligence will be compassionate” is to abdicate our responsibility for raising our cognitive children in an environment that rewards the morals we want them to have, which is emphatically not what we’re currently doing for the collective intelligences we’ve already created.