logoalt Hacker News

thomasliaoyesterday at 9:34 AM3 repliesview on HN

It's an important question! If you are paying a lot of money to use AI models, you care that you are using the best for your task. And it turns out that figuring out which AI models is best for your task is not trivial and requires some expertise.


Replies

wseqyrkuyesterday at 9:48 AM

That was too nice of a reply, I apologize. I just can't understand the thought process and that what exactly are we optimizing for? If you are paying a lot of money to use AI models, you already have so much overhead that precise ranking in an eval is not gonna make much difference between equally "frontier" models. Especially since models are sensitive to the input. So the eval is just gonna evaluate the eval with very high accuracy. It might be equivalent to the illusion of safety thing applied to financial risk.

show 4 replies
liveoneggsyesterday at 12:36 PM

They all change day to day and are non-deterministic by design. Your settled answer is only good for a moment.

lupireyesterday at 11:47 AM

But frontier models are constantly changing.