The session reverts to opus if it trips a limiter. Is the benchmark detecting and correcting for that?
Only OpenAI would know that.
Only OpenAI would know that.