logoalt Hacker News

mdasentoday at 2:31 PM2 repliesview on HN

It's really interesting how much the AI harness seems to matter. Going from 48% via Google's official results to 65% is a huge jump. I feel like I'm constantly seeing results that compare models and rarely seeing results that compare harnesses.

Is there a leaderboard out there comparing harness results using the same models?


Replies

manxtoday at 3:42 PM

We probably want to compare the cartesian product of model+harness.

GodelNumberingtoday at 3:13 PM

I really wish there was! I thought of even creating one but it would be conflict of interest