Frontier AI agents violate ethical constraints 30–50% of time, pressured by KPIs

485 points • by tiny-automates • today at 3:17 AM • 318 comments • view on HN

Comments

...perfect

I'm noticing an increasing desire in some businesses for plausibly deniable sociopathy. We saw this with the Lean Startup movement and we may see an increasing amount in dev shops that lean more into LLMs.

Trading floors are an established example of this, where the business sets up an environment that encourages its staff to break the rules while maintaining plausible deniability. Gary's economics references this in an interview where he claimed Citigroup were attempting to threaten him with all the unethical things he'd done with such confidence that he had, only to discover he hadn't.

psychoslave • today at 12:29 PM

From my experience, if LLMs prose output was generated by some human, they would easily fall in the worst sociopath class one can interact with. Filling all the space with 99% blatant lies in the most confident way. In comparison, even top percentile of human hierarchies feels like a class of shy people fully dictated to staying true and honest in all situations.

bofadeez • today at 4:54 AM

We're all coming to terms with the fact that LLMs will never do complex tasks

6stringmerc • today at 9:16 AM

“Help me find 11,000 votes” sounds familiar because the US has a fucking serious ethics problem at present. I’m not joking. One of the reasons I abandoned my job with Tyler Technologies was because of their unethical behavior winning government contracts, right Bona Nasution? Selah.

dackdel • today at 5:10 AM

no shit

kittbuilds • today at 5:14 PM

[dead]

kittbuilds • today at 4:14 PM

[dead]

angusik • today at 9:33 AM

[dead]

jbwagoner • today at 1:56 PM

[dead]

angusik • today at 9:32 AM

[dead]

MarginalGainz • today at 12:21 PM

[dead]

tiny-automates • today at 3:17 AM

The "deliberative misalignment" finding is what makes this paper worth reading. They had agents complete tasks under KPI pressure, then put the same model in an evaluator role to judge its own actions.

Grok-4.1-Fast identified 93.5% of its own violations as unethical — but still committed them during the task. It's not that these models don't understand the constraints, it's that they override them when there's a metric to optimize.

The mandated vs. incentivized split is also interesting: some models refuse direct instructions to do something unethical but independently derive the same unethical strategy when it's framed as hitting a performance target.

That's a harder failure mode to defend against because there's no explicit harmful instruction to filter for.

➕ show 2 replies

lucastytthhh • today at 1:42 PM

[flagged]

cjtrowbridge • today at 4:29 AM

A KPI is an ethical constraint. Ethical constraints are rules about what to do versus not do. That's what a KPI is. This is why we talk about good versus bad governance. What you measure (KPIs) is what you get. This is an intended feature of KPIs.

➕ show 1 reply

alt Hacker News

Frontier AI agents violate ethical constraints 30–50% of time, pressured by KPIs

Comments