I hate to help provide possible soultions to an entire process I don't approve of, but maybe the fuzzy tools need old style deterministic tools the same way and for the same reasons we do.
So instead of an LLM trying to answer a math or reason question by finding a statistical match with other similar groups of words it found on 4chan and the all in podcast and a terrible recipe for soup written by a terrible cook, it can use a calculator when it needs a calculator answer.
No, they just need to be trained to have adversarial self review "thinking" processes.
You ask an LLM "What's wrong with your answer?" and you get pretty good results.
I think that is how the smarter agents do things? Just like Claude/ChatGPT sometimes does a web search they can do other tool calls instead of just making a statistical guess. Of course it doesn’t always make the bright choice between those options though…