I've been teaching myself physics lately and have found Grok to be one of the best both at coming to a correct answer and helping me to understand how to get it myself. It also seems a lot better than other models at saying "I don't know" or pointing out when my question doesn't make sense.
I bet any flagship model would do as well if you prompted it with how it should do it.
Comparing grok vs Gemini vs GPT vs Sonnet is like comparing mid-high end CPUs. They're all about as good as one another for most work.
Grok has one of the best reasoning and halucination benchmark scores.