Yeah, but don’t you agree that less tokens to accomplish the same goal is a sign of increasing intelligence?
Less cost to accomplish the same goal is a sign of intelligence. That's not necessarily achieved with less tokens but it may be.
Kind of? But I really care about price speed and quality. If it used 10x tokens at 1/10th the tokens and same latency I would be neutral on it.
Kimmi 2.6 for example seems to throw more tokens to improve performance (for better or worse)
It could be. Or just smarter caching (which wouldn't necessarily have to do with model intelligence). Or just overfitting on the 95% most common prompts (which could save tokens but make the models less intelligent/flexible).