I did some tests with heavily math oriented programming using ChatGPT and Gemini to rubber-duck (not agentic), going over C performance tuning, checking C code for possible optimizations, going over math oriented code and number theory, and working on optimizing threading, memory throughput, etc. to make the thing go faster, then benchmarking runs of the updated code. Gemini was by far better than ChatGPT in this domain. I was able to test changes by benchmarking. For my use case it was night and day, Gemini's advice generally quite strong and was useful to significantly improve benchmarked performance, ChatGPT was far less useful for this use case. What will work for you will depend on your use case, how well your prompting is tuned to the system you're using, and who knows what other factors, but I have a lot of benchmarks that are clear evidence of the opposite of your experience.
I did some tests with heavily math oriented programming using ChatGPT and Gemini to rubber-duck (not agentic), going over C performance tuning, checking C code for possible optimizations, going over math oriented code and number theory, and working on optimizing threading, memory throughput, etc. to make the thing go faster, then benchmarking runs of the updated code. Gemini was by far better than ChatGPT in this domain. I was able to test changes by benchmarking. For my use case it was night and day, Gemini's advice generally quite strong and was useful to significantly improve benchmarked performance, ChatGPT was far less useful for this use case. What will work for you will depend on your use case, how well your prompting is tuned to the system you're using, and who knows what other factors, but I have a lot of benchmarks that are clear evidence of the opposite of your experience.