> I think LLMs are getting better (well better trained) on dealing with basic math questions but you still need to help them
I feel like that's a fools errand. You could already in GPT3 days get the LLM to return JSON and make it call your own calculator, way more efficient way of dealing with it, than to get a language model to also be a "basic calculator" model.
Luckily, tools usage is easier than ever, and adding a `calc()` function ends up being really simple and precise way of letting the model focus on text+general tool usage instead of combining many different domains.
Add a tool for executing Python code, and suddenly it gets way broader capabilities, without having to retrain and refine the model itself.
I personally think getting LLMs to better deal with numbers will go a long way to making them more useful for different fields. I'm not an accountant, so I don't know how useful it would be. But being able to say, here are some numbers do this for scenario A and this for scenario B and so forth might be useful.
Having said that, I do think models that favours writing code and using a "LLM interpretation layer" may make the most sense for the next few (or more) years.