I think it’s better to think of an LLM as a very good hint engine. It’s good at coming up with more possibilities to consider and less good at making sure they work, unless it has an external system to test ideas on and is trained to use it. In the case of applied math, it’s not enough to prove theorems. It also needs to be testing against the real world somehow.