How is anyone predicting timelines for AGI when these systems can’t do basic addition of 2 arbitrary numbers with 100% accuracy?
LLMs should use tool calling (which is 100% reliable) instead of doing math internally. But in general it would be nice to be able to teach a process and have the AI execute it deterministically. In some sense, reliability between 99% and 100% is the worst because you still can't trust the output but the verification feels like wasted effort. Maybe code gen and execution will get us there.
Can you do basic addition of 2 arbitrary numbers with 100% accuracy (no tools) ? No you can't. You will make mistakes for a sufficiently large N even with pen and paper, and a very small N without. Are you no longer generally intelligent ?