In the Dwarkesh's podcast Dylan Patel from SemiAnalysis said that Google can currently afford to have larger models than competitors, because of access to much more compute, TPUs etc.
That could explain the token usage difference because larger models usually use less tokens per the same unit of intelligence.