Great point! However, I’d ask the following: isn't faithfully following nuanced instructions an _agentic capability_ by itself?
If a model only performs well once the rules are clarified, that’s still revealing something important about its agency: it’s brittle when policies are ambiguous, but much stronger when they’re structured.
I agree with you that there’s a fine line between genuinely helping the model 'understand' the task and just 'teaching to the test'.
That said, Tau² is framed as a very specific use case — and we showed it can be solved more reliably. At the end of the day, that means we now have an agent built on a cheaper, faster model that still performs its job with higher reliability.