logoalt Hacker News

JumpCrisscrosslast Monday at 12:58 AM4 repliesview on HN

> I'm not surprised to see these horror stories

I am! To the point that I don’t believe it!

You’re running an agentic AI and can parse through logs, but you can’t sandbox or back up?

Like, I’ve given Copilot permission to fuck with my admin panel. It promptly proceeded to bill thousands of dollars, drawing heat maps of the density of built structures in Milwaukee; buying subscriptions to SAP Joule and ArcGIS for Teams; and generating terabytes of nonsense maps, ballistic paths and “architectural sketch[es] of a massive bird cage the size of Milpitas, California (approximately 13 square miles)” resembling “a futuristic aviary city with large domes, interconnected sky bridges, perches, and naturalistic environments like forests, lakes, and cliffs inside.”

But support immediately refunded everything. I had backups. And it wound up hilarious albeit irritating.


Replies

AdieuToLogiclast Monday at 2:45 AM

>> I'm not surprised to see these horror stories

> I am! To the point that I don’t believe it!

> You’re running an agentic AI and can parse through logs, but you can’t sandbox or back up?

When best practices for using a tool involves sandboxing and/or backing up before each use in order to minimize the blast radius of using same, it begs the question; why use it knowing there is a nontrivial probability one will have to recover from it's use any number of times?

> Like, I’ve given Copilot permission to fuck with my admin panel. It promptly proceeded to bill thousands of dollars ... But support immediately refunded everything. I had backups.

And what about situations where Claude/Copilot/etc. use were not so easily proven to be at fault and/or their impacts were not reversible by restoring from backups?

show 1 reply
rurplast Monday at 3:41 AM

Wait, so you've literally experienced these tools going conpletely off the rails but you can't imagine anyone using them recklessly? Not to be overly snarky but have you worked with people before? I fully expect that most people will be careful to not run into this sort of mess, but I'm equally sure that some subset users will be absolutely asking for it.

fwipsylast Monday at 1:36 AM

Can you post the birdcage thing? That sounds fascinating.

show 2 replies
QuercusMaxlast Monday at 2:14 AM

....how is this a serious product that anyone could consider using?

show 1 reply