This is not true. Well, it kinda is, but nobody will be stupid enough to hand-code an account recove...

afdbcreid • today at 5:56 PM • 3 replies • view on HN

This is not true. Well, it kinda is, but nobody will be stupid enough to hand-code an account recovery where you get to type any email address.

The reason it worked there is that the designers of the system didn't anticipate that the AI will agree to accept any email (maybe they even put guardrails against it in the system prompt, we don't know). It's more like social engineering than bad-security-code, except that like the sibling comment said an actual human will probably not approve that.

Replies

addaon • today at 6:40 PM

> The reason it worked there is that the designers of the system didn't anticipate that the AI will agree to accept any email (maybe they even put guardrails against it in the system prompt, we don't know).

These are contradictory cases. If you put guardrails into the system prompt, you've anticipated that the AI will take the action you're guardrailing against. And since AI prompt compliance is at best stochastic (and realistically just crap, over large sample sizes), every guardrail is an explicit recognition of a failure -- the guardrail will be ignored, and you can't pretend you didn't realize it was a problem, since you put it in.

➕ show 1 reply

dpark • today at 6:02 PM

Maybe? I don’t know what logic was actually in the LLM vs it just using a bad tool. Unless I missed it, the article had no actual context on that either.

This looks like a terrible design rather than an AI problem to me, though.

➕ show 3 replies

alt Hacker News

Replies