logoalt Hacker News

underlinesyesterday at 3:22 PM0 repliesview on HN

well, your own, unleaked ones, representing your real workloads.

if you can't afford to do that, look at a lot of them, eg. on artificialanalysis.com they merge multiple benchmarks across weighted categories and build an Intelligence Score, Coding Score and Agentic score.