I'm kind of surprised how many people are okay with deploying code that hasn't been audite...

losvedir • last Tuesday at 11:05 PM • 3 replies • view on HN

I'm kind of surprised how many people are okay with deploying code that hasn't been audited.

I read If Anyone Builds It Everyone Dies over the break. The basic premise was that we can't "align" AI so when we turn it loose in an agent loop what it produces isn't necessarily what we want. It may be on the surface, to appease us and pass a cursory inspection, but it could embed other stuff according to other goals.

On the whole, I found it a little silly and implausible, but I'm second guessing parts of that response now that I'm seeing more people (this post, the Gas Town thing on the front page earlier) go all-in on vibe coding. There is likely to be a large body of running software out there that will be created by agents and never inspected by humans.

I think a more plausible failure mode in the near future (next year or two) is something more like a "worm". Someone building an agent with the explicit instructions to try to replicate itself. Opus 4.5 and GPT 5.2 are good enough that in an agent loop they could pretty thoroughly investigate any system they land on, and try to use a few ways to propagate their agent wrapper.

Replies

codyb • last Wednesday at 6:03 PM

There's a really interesting story I read somewhere about some application which used neural nets to optimize for a goal (this was a while ago, it could have been merkel trees or something, who knows, not super important)

And everything worked really well until they switched chip set.

At which point the same model failed entirely. Upon inspection it turned out the AI model had learned that overloading particular registers would cause such an electrical charge buildup that transistors on other pathways would be flipped.

And it was doing this in a coordinated manner in order to get the results it wanted lol.

I can't find any references in my very cursory searches, but your comment reminded me of the story

btown • last Tuesday at 11:16 PM

Perhaps our only saving grace is that many LLMs at varying levels of "dumbness" exist.

Is it possible to create an obfuscated quine that exhibits stable detection-avoiding behavior on every frontier model simultaneously, as well as on an old-school classifier and/or GPT-3 era LLM fine-tuned just for worm detection? One incapable of even thinking about what it's seeing, and being persuaded to follow its subtle propagation logic? I'm not sure that the answer is yes.

The larger issue to me is less that an LLM can propagate in generated code undetected, but rather that an attacker's generated code may soon be able to execute a level of hyper-customized spear-phishing-assisted attack at scale, targeting sites without large security teams - and that it will be hitting unintentional security flaws introduced by those smaller companies' vibe code. Who needs a worm when you have the resources of a state-level attacker at your fingertips, and numerous ways to monetize? The balance of power is shifting tremendously towards black hats, IMO.

TacticalCoder • last Tuesday at 11:43 PM

Why think about nefarious intent instead of just user error? In this case LLM error instead of programmer error.

Most RCEs, 0-days, and whatnots are not due to the NSA hiding behind the "Jia Tan" pseudo to try to backdoor all the SSH servers on all the systemd [1] Linuxes in the world: they're just programmer errors.

I think accidental security holes with LLMs are way, way, way more likely than actual malicious attempts.

And with the amount of code spoutted by LLMs, it is indeed --and the lack of audit is-- an issue.

[1] I know, I know: it's totally unrelated to systemd. Yet only systems using systemd would have been pwned. If you're pro-systemd you've got your point of view on this but I've got mine and you won't change my mind so don't bother.

alt Hacker News

Replies