I feel even if the models are stagnating, the tooling around them, and the integrations and harnesses they have are getting significantly more capable (if not always 'better' - the recent vscode update really handicapped them for some reason). Things like the new agent from booking.com or whatever, if it could integrate with all hotels, activities, mapping tools, flight system, etc could be hugely powerful.
Assuming we get no better than opus 4.6, they're very capable. Even if they make up nonsense 5% of the time!
This matches my experience exactly. The model itself is table stakes at this point — what actually determines whether an AI product is useful is the harness around it. I built a career document generator (resumes and cover letters through conversational AI) and the jump in quality came almost entirely from structured dialogue flows and output validation, not from swapping in a newer model. Constraining the interaction to specific turns — "tell me about this role," "what metrics can you share" — produces dramatically better results than free-form "write me a resume." The model capability was already sufficient two generations ago; the tooling is what made it actually reliable. Working version here if curious: https://super.myninja.ai/apps/6de082c7-a05f-4fc5-a7d3-ab56cc...