logoalt Hacker News

dkerstentoday at 8:27 AM4 repliesview on HN

> This is not scientific at all, just vibes, YMMV.

This is the problem.

I would love to have a product sheet showing what each models strengths an weaknesses are, so that I can have a clear decision tree of "if this kind of work, use model X", or "model Y should be used in ways Z". But they all look the same from the outside and the only way to figure out which might be marginally better at what is to do extensive, time consuming, and perhaps expensive testing.


Replies

coldteatoday at 9:37 AM

>I would love to have a product sheet showing what each models strengths an weaknesses are, so that I can have a clear decision tree of "if this kind of work, use model X", or "model Y should be used in ways Z". But they all look the same from the outside and the only way to figure out which might be marginally better at what is to do extensive, time consuming, and perhaps expensive testing.

Think of it less like a static tool, and more like a human helper, where the same holds.

show 6 replies
couscouspietoday at 9:10 AM

That would be ideal, but AI is less like a tool and more like a human in this regard and you don't have character sheets for each of your colleagues, as well.

show 2 replies
ameliustoday at 8:45 AM

Yes, but benchmarks can be gamed.

Maybe we need better reviewers then?

dotancohentoday at 8:47 AM

Honestly, the differences between AI models always felt to me like the differences between coworkers or job candidates. They don't all share the same strengths and weaknesses - and they all have both good days and bad days.

Realising this made me respect the "I" in "AI" a bit more seriously.