But LLMs already do the paperclip thing. Suppose you tell a coding LLM that your monitoring system...

InsideOutSanta • today at 3:53 PM • 1 reply • view on HN

But LLMs already do the paperclip thing.

Suppose you tell a coding LLM that your monitoring system has detected that the website is down and that it needs to find the problem and solve it. In that case, there's a non-zero chance that it will conclude that it needs to alter the monitoring system so that it can't detect the website's status anymore and always reports it as being up. That's today. LLMs do that.

Even if it correctly interprets the problem and initially attempts to solve it, if it can't, there is a high chance it will eventually conclude that it can't solve the real problem, and should change the monitoring system instead.

That's the paperclip problem. The LLM achieves the literal goal you set out for it, but in a harmful way.

Yes. A child can understand that this is the wrong solution. But LLMs are not children.

Replies

throw310822 • today at 3:58 PM

> it will conclude that it needs to alter the monitoring system so that it can't detect the website's status anymore and always reports it as being up. That's today. LLMs do that.

No they don't?

➕ show 1 reply

alt Hacker News

Replies