logoalt Hacker News

zsoltkacsandiyesterday at 9:06 PM3 repliesview on HN

> The nerf is psychologial, not actual

Once I tested this, I gave the same task for a model after the release and a couple weeks later. In the first attempt it produced a well-written code that worked beautifully, I started to worry about the jobs of the software engineers. Second attempt was a nightmare, like a butcher acting as a junior developer performing a surgery on a horse.

Is this empirical evidence?

And this is not only my experience.

Calling this phychological is gaslighting.


Replies

lukevyesterday at 9:32 PM

> Is this empirical evidence?

Look, I'm not defending the big labs, I think they're terrible in a lot of ways. And I'm actually suspending judgement on whether there is ~some kind of nerf happening.

But the anecdote you're describing is the definition of non-empirical. It is entirely subjective, based entirely on your experience and personal assessment.

show 1 reply
roywigginstoday at 12:50 AM

https://en.wikipedia.org/wiki/Regression_toward_the_mean

The way this works is:

1) x% of users have an exceptional first experience by chance. Nobody who has a meh first experience bothers to try a second time. 2) x²% of users also have an exceptional second experience by chance 3) So a lot of people with a great first experience think the model started off great and got suddenly worse

Suppose it's 25% that have a really great first experience. 25% of them have a great second experience too, but 75% of them see a sudden decline in quality and decide that it must be intentional. After the third experience this population gets bigger again.

So by pure chance and sampling biases you end up convincing a bunch of people that the model used to be great but has gotten worse, but a much smaller population of people who thought it was terrible but got better because most of them gave up early.

This is not in their heads- they really did see declining success. But they experienced it without any changes to the model at all.

show 1 reply
ACCount37yesterday at 9:12 PM

No, it's entirely psychological.

Users are not reliable model evaluators. It's a lesson the industry will, I'm afraid, have to learn and relearn over and over again.

show 2 replies