logoalt Hacker News

taurathtoday at 1:25 AM2 repliesview on HN

Doesn't this "silent degredation" prevent any actual evaluation of the model? If the model fails at something, this allows anyone to claim that it failed due to degradation.


Replies

lionkortoday at 8:41 AM

Who cares if it can be evaluated independently? The majority of commenters on HN were happy to vibe code and ship products with the models we had 1-2 years ago. It continues to be laughable.

I understand that moving the goalpost every release is unfair, but it's similarly concerning to consider that people were letting GPT 4.X vibe code and ship entire products.

janalsncmtoday at 3:01 AM

I don’t think so? They can claim it was an act of God for all I care, but at the end of the day the model failed the task.