It's really interesting how much the AI harness seems to matter. Going from 48% via Google'...

mdasen • today at 2:31 PM • 2 replies • view on HN

It's really interesting how much the AI harness seems to matter. Going from 48% via Google's official results to 65% is a huge jump. I feel like I'm constantly seeing results that compare models and rarely seeing results that compare harnesses.

Is there a leaderboard out there comparing harness results using the same models?

Replies

manx • today at 3:42 PM

We probably want to compare the cartesian product of model+harness.

GodelNumbering • today at 3:13 PM

I really wish there was! I thought of even creating one but it would be conflict of interest

alt Hacker News

Replies