“no harnass at all” might be an issue, though, as these types of benchmarks are often gamified and t...

stingraycharles • today at 9:48 AM • 0 replies • view on HN

“no harnass at all” might be an issue, though, as these types of benchmarks are often gamified and then models perform great on them without actually being better models.

alt Hacker News