Opus 4, with enough context, could do most all I wanted in a single shot. More often than not, when I had a bad outcome and was frustrated I would realize that I was the problem (in giving improper direction or missing key context).
I also was in a pretty sweet position having a boat load of credits and premo vertex rate limits so I could 'afford' to dump hundreds of thousands of tokens in context all day.
With Opus 4.5 and 4.6, I find I have to steer very actively.
This is comparing using Opus 4 directly rather than comparing the performance of the models in Claude Code for example, or any 'agentic' setup.
Kinda reminds me of 4o vs 4-turbo.
I would imagine they are smaller models.