That's an interesting claim, but I don't see it in my own work. They have got better but it's very hard to quantify. I just find myself editing their work much less these days (currently using GPT 5.4).
The problem with evals is the underlying rubric will always be either subjective, or a quantitative score based on something that is likely now baked into the training set directly.
You kind of have to go on "feels" for a lot of this.
Yeah same, and all my coworkers feel the same.
Most of us have been coding for ages. I actually find it really odd people keep trying to disprove things that are relatively obvious with LLMs
Without meaning to sound dismissive, because I'm really not intending to, there's also the possibility that you've gotten worse after enough time using them. You're treating yourself as a constant in this, but man cannot walk in the same river twice.