> This is not scientific at all, just vibes, YMMV. This is the problem. I would love to have ...

dkersten • today at 8:27 AM • 4 replies • view on HN

> This is not scientific at all, just vibes, YMMV.

This is the problem.

I would love to have a product sheet showing what each models strengths an weaknesses are, so that I can have a clear decision tree of "if this kind of work, use model X", or "model Y should be used in ways Z". But they all look the same from the outside and the only way to figure out which might be marginally better at what is to do extensive, time consuming, and perhaps expensive testing.

Replies

coldtea • today at 9:37 AM

>I would love to have a product sheet showing what each models strengths an weaknesses are, so that I can have a clear decision tree of "if this kind of work, use model X", or "model Y should be used in ways Z". But they all look the same from the outside and the only way to figure out which might be marginally better at what is to do extensive, time consuming, and perhaps expensive testing.

Think of it less like a static tool, and more like a human helper, where the same holds.

➕ show 6 replies

couscouspie • today at 9:10 AM

That would be ideal, but AI is less like a tool and more like a human in this regard and you don't have character sheets for each of your colleagues, as well.

➕ show 2 replies

amelius • today at 8:45 AM

Yes, but benchmarks can be gamed.

Maybe we need better reviewers then?

dotancohen • today at 8:47 AM

Honestly, the differences between AI models always felt to me like the differences between coworkers or job candidates. They don't all share the same strengths and weaknesses - and they all have both good days and bad days.

Realising this made me respect the "I" in "AI" a bit more seriously.

alt Hacker News

Replies