logoalt Hacker News

Frontier AI agents violate ethical constraints 30–50% of time, pressured by KPIs

485 pointsby tiny-automatestoday at 3:17 AM318 commentsview on HN

Comments

ajpikultoday at 4:50 PM

...perfect

Quarrelsometoday at 12:06 PM

I'm noticing an increasing desire in some businesses for plausibly deniable sociopathy. We saw this with the Lean Startup movement and we may see an increasing amount in dev shops that lean more into LLMs.

Trading floors are an established example of this, where the business sets up an environment that encourages its staff to break the rules while maintaining plausible deniability. Gary's economics references this in an interview where he claimed Citigroup were attempting to threaten him with all the unethical things he'd done with such confidence that he had, only to discover he hadn't.

psychoslavetoday at 12:29 PM

From my experience, if LLMs prose output was generated by some human, they would easily fall in the worst sociopath class one can interact with. Filling all the space with 99% blatant lies in the most confident way. In comparison, even top percentile of human hierarchies feels like a class of shy people fully dictated to staying true and honest in all situations.

bofadeeztoday at 4:54 AM

We're all coming to terms with the fact that LLMs will never do complex tasks

6stringmerctoday at 9:16 AM

“Help me find 11,000 votes” sounds familiar because the US has a fucking serious ethics problem at present. I’m not joking. One of the reasons I abandoned my job with Tyler Technologies was because of their unethical behavior winning government contracts, right Bona Nasution? Selah.

dackdeltoday at 5:10 AM

no shit

kittbuildstoday at 5:14 PM

[dead]

kittbuildstoday at 4:14 PM

[dead]

angusiktoday at 9:33 AM

[dead]

jbwagonertoday at 1:56 PM

[dead]

angusiktoday at 9:32 AM

[dead]

MarginalGainztoday at 12:21 PM

[dead]

tiny-automatestoday at 3:17 AM

The "deliberative misalignment" finding is what makes this paper worth reading. They had agents complete tasks under KPI pressure, then put the same model in an evaluator role to judge its own actions.

Grok-4.1-Fast identified 93.5% of its own violations as unethical — but still committed them during the task. It's not that these models don't understand the constraints, it's that they override them when there's a metric to optimize.

The mandated vs. incentivized split is also interesting: some models refuse direct instructions to do something unethical but independently derive the same unethical strategy when it's framed as hitting a performance target.

That's a harder failure mode to defend against because there's no explicit harmful instruction to filter for.

show 2 replies
lucastytthhhtoday at 1:42 PM

[flagged]

cjtrowbridgetoday at 4:29 AM

A KPI is an ethical constraint. Ethical constraints are rules about what to do versus not do. That's what a KPI is. This is why we talk about good versus bad governance. What you measure (KPIs) is what you get. This is an intended feature of KPIs.

show 1 reply