I'm out of time but "reasoning input tokens" from fortune 5000 engineers sounds like a lobotomized LSD dream, would you care on elaborating how you distinguish between reasoning and non-reasoning? vs "question on duty"?
I believe they’re just classifying all models into “reasoning models” eg o3 vs “non reasoning models” eg 4o and just doing a comparison of total tokens (input tokens + hidden reasoning output tokens + shown output tokens)
"reasoning" models like GPT 5 et al do a pre-generation step where they:
- Take in the user query (input tokens)
- Break that into a game plan. Ex: "Based on user query: {query} generate a plan of action." (reasoning tokens)
- Answer (output tokens)
Because the reasoning step runs in a loop until it's run through it's action plan, it frequently uses way more tokens than the input/output step.