I don't buy it.
Inference cost has dropped 300x in 3 years, no reason to think this won't keep happening with improvements on models, agent architecture and hardware.
Also, too many people are fixated with American models when Chinese ones deliver similar quality often at fraction of a cost.
From my tests, "personality" of an LLM, it's tendency to stick to prompts and not derail far outweights the low % digit of delta in benchmark performance.
Not to mention, different LLMs perform better at different tasks, and they are all particularly sensible to prompts and instructions.