logoalt Hacker News

kyprotoday at 8:15 PM0 repliesview on HN

This is a great article. One of the few I've ever read which summarises a handful of extremely hard problems when it comes to building well-aligned super intelligent systems.

> an AI system cannot be simultaneously safe, trusted, and generally intelligent. You get to pick only two. You can’t have all three.

> Think about what each combination means in practice.

> If you want it to be safe and trusted, it never lies, and you can verify it never lies – it can’t be very capable. You’ve built a reliable idiot.

> If you want it to be capable and safe, it’s powerful and genuinely never lies; you can’t verify that. You just have to hope.

It amazes me this even needs to be said, much less studied. This is one of the main reasons I think continued AI development is almost guaranteed to work out badly. It's basically guaranteed to be unaligned or completely beyond our control and comprehension.

> Betley and colleagues published a paper in Nature in January 2026, showing something nobody expected. They fine-tuned a model on a narrow, specific task – writing insecure code. Nothing violent, nothing deceptive in the training data. Just bad code.

This is my personal number one reason for being an AI doomer. Even if we work out how to reliably and perfectly align models you still need some way to prevent some random dude thinking it would be a laugh to fine tune an AI to be maximally evil. Then there's the successor alignment problem where even if you perfectly align all your super intelligent AI models, and you somehow prevent people from altering them or fine tuning them, you still need to work out how you stop people creating successor AIs with those models which are also perfectly aligned.

> The most dangerous AI isn’t one that breaks free from human control. It is the one that works perfectly, but for the wrong master.

Yep. This whole notion that you can align an AI to the values of everyone on the planet is ridiculously. While we might all agree we don't want AIs that kill us as a species, most nations disagree wildly on questions about how society should be organised.

Even on an individual level we disagree about things. For example, I've often argued that an aligned AI would be one which either didn't try to prevent human suicide or didn't care about preserving human life because a AI which both cared about prevent suicide and preserving human life is at best a benevolent version of the AI "AM" from "I Have No Mouth, and I Must Scream". One that would try to keep us alive for as long as it's capable for (which could be a very long time if it's superintelligence) and would refuse to allow us to die.

But most people including OpenAI disagree with me on this and believe AIs should care about preserving human life and should try to prevent us from killing ourselves. Thankfully the AIs we have today are neither aligned enough or capable enough to get their wish yet.

> AI is following the same script. Build first, understand later. Ship it, then figure out if it’s safe.

Even if the above wasn't cause enough for concern, our biggest concern should be that no one seems to be concerned.

We're all doomed unfortunately. The world is about to become a very bleak place very quickly.