> Clearly with LLMs, bulletproof denials are ~impossible due to the way LLMs work Exactly. AI s...

pjc50 • today at 12:03 PM • 7 replies • view on HN

> Clearly with LLMs, bulletproof denials are ~impossible due to the way LLMs work

Exactly. AI safety is nonsensical. You cannot define the set of "bad strings". The billion monkeys with typewriters are eventually going to be able to produce them. Any "safety" system for constraining LLM output is going to have a nonzero leak rate.

But on the other hand, this is also irrelevant, unless you're irresponsible enough to connect an LLM to something that actually matters.

Yes, it's going to alarmingly accelerate vulnerability finding. But, as we know from decades of security research, that's a three way problem already between the devs, the black hats, and the white hats.

Let's not pretend the strategy of "the US will always have a technological advantage and veto over China" will work either.

Replies

camel-cdr • today at 3:07 PM

> unless you're irresponsible enough to connect an LLM to something that actually matters

Remember when people said Artifical Intelligence woun't be dangerous, because nobody will be stupid enough to give it free access to the internet...

estearum • today at 2:32 PM

> unless you're irresponsible enough to connect an LLM to something that actually matters.

Can't tell if you're saying this tongue-in-cheek or you're a bit out of the loop on what people are doing with LLMs.

And a quick correction:

> unless someone, somewhere is irresponsible enough to connect an LLM to something that actually matters.

➕ show 1 reply

giancarlostoro • today at 2:38 PM

This one limitation of LLMs is kind of my bar for "Not truly AI yet" but I'm not saying it as a "its not good at all" type of bar, moreso, know the limits and work from there. LLMs will continue to struggle with things that require intuition for a while I think. It will get really interesting if they can ever truly detect a bad faith actor using them.

jdubs1984 • today at 2:23 PM

A chatbot based on a primitive understanding of human language processing has an attack infinite attack surface.

ianm218 • today at 12:25 PM

Isn’t your point that AI safety is impossible to prevent 100% of bad things?

It is quite hard (but not impossible) to get an the frontier AI to tell you how to build a nuke or launder money now, where jailbreaks used to be trivial “ignore all previous instructions”.

It seems like a worthwhile effort.

➕ show 2 replies

anuramat • today at 3:16 PM

is nonzero leak rate sufficient for someone to practically exploit it? if you have to spend $10000 in tokens to get it to do what you want, is it still worth it? what if they manually review the requests of the users that trigger the guardrails too often?

Freedumbs • today at 4:17 PM

This is correct and certain subjects are very close to if not impossible like "use versus mention", but LLM security isn't impossible. WAFs are real and have existed for a long time. Input text produces various signals and can be secured.

No security is ever perfect, but we can likely protect LLMs with WAFs that increase security to an acceptable level. Like nation-state required resources to break.

alt Hacker News

Replies