I think it's a use case that identity/authorization/permission models are simply not made for.
Sure, we can ban users and we can revoke tokens, but those assume that:
1. Something potentially malicious got access to our credentials 2. Banning that malicious entity will solve our problem 3. Once we did that, repaired the damage and improved our security, we don't expect the same thing to happen again
None of these apply with LLMs in the loop!
They aren't malicious, just incompetent in a way that hiring someone else won't fix. The solution to this is way more extensive than most people seem to grasp at the moment.
What we need is less like a sturdy door with a fancy lock, and more like that special spoon for people with parkinson's. Unlimited undo history.
[dead]
> What we need is less like a sturdy door with a fancy lock, and more like that special spoon for people with parkinson's. Unlimited undo history.
Agree -- you can't solve probabilistic incorrectness with redresses designed for deterministic incorrectness.
This is like the 'How i parse html w regex?' question.
Imho, the next step is going to be around human-time-efficient risk bounding.
In the same way that the first major step was correctness-bounding (automated continuous acceptance testing to make a less-than-perfect LLM usable).
If I had to bet, we'll eventually land on out-of-band (so sufficiently detached to be undetectable by primary LLM) stream of thought monitoring by a guardrail/alignment AI system with kill+restart authority.