logoalt Hacker News

jiggunjertoday at 6:47 AM1 replyview on HN

That's objective metrics. Not an objective way to compare, which is the selection of metrics to include.


Replies

cromkatoday at 6:58 AM

That's exactly why there's a ton of different benchmarking suites used for evaluating hardware performance.

I reckon we'll have similar suites comparing different aspects of models.

And, at some point, we'll be dealing with models skewing results whenever they detect they're being benchmarked, like it happened before with hardware. Some say that's already happening with the pelican test.

show 1 reply