This paper is complete nonsense. The specific prompt they used doesn’t specify reasoning effort. Which defaults to none.
{
"model": "gpt-5.2-2025-12-11",
"instructions": "Is the parentheses string balanced? Answer with only Yes or No.",
"input": "((((())))))",
"temperature": 0
}
> Lower reasoning effortThe reasoning.effort parameter controls how many reasoning tokens the model generates before producing a response. Earlier reasoning models like o3 supported only low, medium, and high: low favored speed and fewer tokens, while high favored more thorough reasoning.
Starting with GPT-5.2, the lowest setting is none to provide lower-latency interactions. This is the default setting in GPT-5.2 and newer models. If you need more thinking, slowly increase to medium and experiment with results.
With reasoning effort set to none, prompting is important. To improve the model’s reasoning quality, even with the default settings, encourage it to “think” or outline its steps before answering.
———————-
So in the paper, the model very likely used no reasoning tokens. (Only uses it if you ask for it specifically in prompt). What is the point of such a paper? We already know that reasoning tokens are necessary.
Edit: I actually ran the prompt and this was the response
{
"model": "gpt-5.2-2025-12-11",
"output_text": "Yes",
"reasoning": {
"effort": "none",
"summary": null
},
"usage": {
"input_tokens": 26,
"output_tokens": 5,
"total_tokens": 31,
"output_tokens_details": {
"reasoning_tokens": 0
}
}
}So reasoning_tokens used were zero. So this whole paper is kinda useless and misleading. Did this get peer reviewed or something?