> The fix for this common pattern is to reason about LLM outputs before making use of them.
That is politics. Not engineering.
Assigning a human to "check the output every time" and blaming them for the faults in the output is just assigning a scapegoat.
If you have to check the AI output every single time, the AI is pointless. You can just check immediately.
The humans are not scapegoats, because they are capable of taking on responsibility.
There is a point to using LLMs. They can save time by doing a first pass. But when they do the last pass, disasters will follow.
Well, attempts to engineer the brittleness out of human behavior have not worked, like, ever.
Well, I'd say there's two dimensions:
1. Check frequency (between every single time and spot checks).
2. Check thoroughness (between antagonistic in-depth vs high level).
I'd agree that, if you're towards the end of both dimensions, the system is not generating any value.
A lot of folks are taking calculated (or I guess in some cases, reckless) risks right now, by moving one or both of those dimensions. I'd argue that in many situations, the risk is small and worth it. In many others, not so much.
We'll see how it goes, I suppose.