Can't al LLM just detect a mathematical reasoning task then produce a formula (not even display it in the production mode) to invoke on an external service engineered for formal logical and mathematical computations?
In many of these examples it produces the wrong formula because it misunderstands the word problem, so a computer algebra system wouldn't help - garbage in, garbage out.
The problem here is more serious than mathematics: the quantitative reasoning itself is highly unreliable.
In many of these examples it produces the wrong formula because it misunderstands the word problem, so a computer algebra system wouldn't help - garbage in, garbage out.
The problem here is more serious than mathematics: the quantitative reasoning itself is highly unreliable.