logoalt Hacker News

mike_hearnyesterday at 12:28 PM3 repliesview on HN

That's an interesting claim, but I don't see it in my own work. They have got better but it's very hard to quantify. I just find myself editing their work much less these days (currently using GPT 5.4).


Replies

dwedgeyesterday at 12:34 PM

Without meaning to sound dismissive, because I'm really not intending to, there's also the possibility that you've gotten worse after enough time using them. You're treating yourself as a constant in this, but man cannot walk in the same river twice.

show 2 replies
nkozyrayesterday at 12:33 PM

The problem with evals is the underlying rubric will always be either subjective, or a quantitative score based on something that is likely now baked into the training set directly.

You kind of have to go on "feels" for a lot of this.

mountainriveryesterday at 11:51 PM

Yeah same, and all my coworkers feel the same.

Most of us have been coding for ages. I actually find it really odd people keep trying to disprove things that are relatively obvious with LLMs