(author here) To be more specific, here's a benchmark that we ran last year, where we compared ...

joatmon-snoo • yesterday at 7:55 PM • 1 reply • view on HN

(author here) To be more specific, here's a benchmark that we ran last year, where we compared schema-aligned parsing against constrained decoding (then called "Function Calling (Strict)", the orange ƒ): https://boundaryml.com/blog/sota-function-calling

Replies

skybrian • yesterday at 11:14 PM

I wonder what it would look if you redid the benchmarks, testing against models that have reasoning effort set to various values. Maybe structured output is only worse if the model isn't allowed to do reasoning first?

alt Hacker News

Replies