It doesn't matter, because any process that seems right most of the time but occasionally is wrong in subtle, hard to spot ways is basically a machine to lull people into not checking, so stuff will always slip through.
It's just like the cars driving themselves but you need to be able to jump in if there is a mistake, humans are not going to react as fast as if they were driving, because they aren't going to be engaged, and no one can stay as engaged as they were when they were doing it themselves.
We need to stop pretending we can tell people they "just" need to check things from LLMs for accuracy, it's a process that inevitably leads to people not checking and things slipping through. Pretending it's the people's fault when essentially everyone using it would eventually end up doing that is stupid and won't solve the core problem.
As someone who has done QA on white collar work it's tiring looking for little errors in work reports. Most people are not cut out for it.
Probably worth including a "bibliography" section of citations that can be automatically checked that they actually exist then
Even disregarding self driving features, it seems like the smarter we make cars the dumber the drivers are. DRLs are great, until they allow you to drive around all night long with no tail lights and dim front lighting because you’re not paying enough attention to what’s actually turned on.
> won't solve the core problem.
what's the core problem tho? Because if the core problem is "using ai", then it's an inevitable outcome - ai will be used, and there are always incentive to cut costs maximally.
So realistically, the solution is to punish mistakes. We do this for bridges that collapse, for driver mistakes on roads, etc. The "easy" fix is to make punishment harsher for mistakes - whether it's LLM or not, the pedigree of the mistake is irrelevant.