I have heard this argument before, but never actually seen concrete evals. The argument goes that ...

rybosome • today at 5:00 PM • 1 reply • view on HN

I have heard this argument before, but never actually seen concrete evals.

The argument goes that because we are intentionally constraining the model - I believe OAI’s method is a soft max (I think, rusty on my ML math) to get tokens sorted by probability then taking the first that aligns with the current state machine - we get less creativity.

Maybe, but a one-off vibes example is hardly proof. I still use structured output regularly.

Oh, and tool calling is almost certainly implemented atop structured output. After all, it’s forcing the model to respond with a JSON schema representing the tool arguments. I struggle to believe that this is adequate for tool calling but inadequate for general purpose use.

Replies

crystal_revenge • today at 5:18 PM

> but never actually seen concrete evals.

The team behind the Outlines library has produced several sets of evals and repeatedly shown the opposite: that constrained decoding improves model performance (including examples of "CoT" which the post claims isn't possible). [0,1]

There was a paper that claimed constrained decoding hurt performance, but it had some fundamental errors which they also wrote about [2].

People get weirdly superstitious when it comes to constrained decoding as though t somehow "limiting the model" when it's just a simple as applying a conditional probably distribution to the logits. I also suspect this post is largely to justify the fact that BAML parses the results (since the post is written by them).

0. https://blog.dottxt.ai/performance-gsm8k.html

1. https://blog.dottxt.ai/oss-v-gpt4.html

2. https://blog.dottxt.ai/say-what-you-mean.html

➕ show 1 reply

alt Hacker News

Replies