logoalt Hacker News

andaiyesterday at 6:32 PM1 replyview on HN

Yeah, nobody's ever silently changed a model while it was deployed. That would be illegal!


Replies

aspenmartinyesterday at 7:12 PM

Why does this have anything to do with what I’m saying, of course the models are updated. I’m saying a new benchmark isn’t public and the model wouldn’t know they are being evaluated on a new benchmark.

Not to mention: thinking that the api behind the scenes is literally swapping to overfit models to maintain some sort of illusion that they perform well on these benchmarks is just beyond ridiculous.

show 1 reply