Determinism is an absolute red herring. A correct output can be expressed in an infinite amount of ways, all of them valid. You can always make an LLM give deterministic outputs (with some overhead), that might bring you limited reproducibility, but that won't bring you correctness. You need correctness, not determinism.
>We need some form of behavioral verification/auditing with guarantees that any input is proven to not produce any number of specific forbidden outputs.
You want the impossible. The domain LLMs operate on is inherently ambiguous, thus you can't formally specify your outputs correctly or formally prove them being correct. (and yes, this doesn't have anything to do with determinism either, it's about correctness)
You just have to accept the ambiguousness, and bring errors or deviation to the rates low enough to trust the system. That's inherent to any intelligence, machine or human.
There remains the issue of responsibility, moral, technical, and legal, though.
> You need correctness, not determinism.
You need both. And there AI models where it's input+prompt+seed that are 100% deterministic.
It's really not much to ask that for the exact same input (data in/prompt/seed) we get the exact same output.
I'm willing to bet that it's going to be the exact same as 100% reproducible builds: people have complained for years "but timestamps about build time makes it impossible" and whatnots but in the end we got our reproducible builds. At some point logic is simply going to win and we'll get more and more models that are 100% deterministic.
And this has absolutely no relation whatsoever to correctness.
This comment I'm making is mostly useless nitpicking, and I overall agree with your point. Now I will commence my nitpicking:
I suspect that it may merely be infeasible, not strictly impossible. There has been work on automatically proving that an ANN satisfies certain properties (iirc e.g. some kinds of robustness to some kinds of adversarial inputs, for handling images).
It might be possible (though infeasible) to have an effective LLM along with a proof that e.g. it won't do anything irreversible when interacting with the operating system (given some formal specification of how the operating system behaves).
But, yeah, in practice I think you are correct.
It makes more sense to put the LLM+harness in an environment which ensures you can undo whatever it does if it messes things up, than to try to make the LLM be such that it certainly won't produce outputs that would mess things up in a way that isn't easily revertible, even if it does turn out that the latter is in principle possible.