logoalt Hacker News

thewebguydyesterday at 7:28 PM1 replyview on HN

> If you want the AI to do anything useful, you need to be able to trust it with the access to useful things. Sandboxing doesn't solve this.

By default, AI cannot be trusted because it is not deterministic. You can't audit what the output of any given prompt is going to be to make sure its not going to rm -rf /

We need some form of behavioral verification/auditing with guarantees that any input is proven to not produce any number of specific forbidden outputs.


Replies

orbital-decayyesterday at 7:38 PM

Determinism is an absolute red herring. A correct output can be expressed in an infinite amount of ways, all of them valid. You can always make an LLM give deterministic outputs (with some overhead), that might bring you limited reproducibility, but that won't bring you correctness. You need correctness, not determinism.

>We need some form of behavioral verification/auditing with guarantees that any input is proven to not produce any number of specific forbidden outputs.

You want the impossible. The domain LLMs operate on is inherently ambiguous, thus you can't formally specify your outputs correctly or formally prove them being correct. (and yes, this doesn't have anything to do with determinism either, it's about correctness)

You just have to accept the ambiguousness, and bring errors or deviation to the rates low enough to trust the system. That's inherent to any intelligence, machine or human.

show 3 replies