I wonder what it would look if you redid the benchmarks, testing against models that have reasoning ...

skybrian • yesterday at 11:14 PM • 0 replies • view on HN

I wonder what it would look if you redid the benchmarks, testing against models that have reasoning effort set to various values. Maybe structured output is only worse if the model isn't allowed to do reasoning first?

alt Hacker News