Lately with Gemini CLI / Jules it doesn't seem like time spent is a good proxy for difficulty. It has a big problem with getting into loops of "I am preparing the response for the user. I am done. I will output the answer. I am confident. Etc etc".
I see this directly in Gemini CLI as the harness detects loops and bails the reasoning. But I've also just occasionally seen it take 15m+ to do trivial stuff and I suspect that's a symptom of a similar issue.