Don't worry about being nitpicky! I'm going to out-nitpick you.... Actually....

nl • today at 5:25 AM • 0 replies • view on HN

Don't worry about being nitpicky! I'm going to out-nitpick you....

Actually....

I write and publish my own benchmark for this stuff. It's an agentic SQL benchmark which isn't in the training data yet and I've found can separate frontier models from close-followers (the only models to get 100% are Opus 4.6 and GPT 5.5).

The best small model I've found is a fine-tune of Opus-3.5 9B which scores 18/25: https://sql-benchmark.nicklothian.com/?highlight=Jackrong_Qw...

Haiku 4.5 scores 20/25, and Haiku is certainly better than Sonnet 3.6. GPT 3.5 scores 13/25.

alt Hacker News