A lot of models have also been overly chat trained. Responding with stuff like “sure I can help you ...

Havoc • 12/09/2024 • 6 replies • view on HN

A lot of models have also been overly chat trained. Responding with stuff like “sure I can help you with that”

That’s just unwanted noise if you’re trying to use them as a code building block in an application. So you need to force json or similar…which I suspect harms accuracy over free form

Replies

TeMPOraL • 12/09/2024

Unfortunately, that "unwanted noise" is a space for the models to compute; trying to eliminate it gives suboptimal responses. What you can do instead is try to corral it - let the model "think" like it wants, but guide it to add markers wrapping the thinking and/or result, then filter out the thinking in UI (for interactive applications) or as an intermediate/post-processing step (for hidden "building blocks").

If you're using Anthropic models, you may actually get improvements from prompting the model to maintain a tagging discipline; see https://docs.anthropic.com/en/docs/build-with-claude/prompt-....

➕ show 6 replies

msp26 • 12/09/2024

> which I suspect harms accuracy over free form

Untrue in my testing. If you want to use chain of thought, you can always throw in a `thoughts` field (json field/xml tags) before the rest of your output.

➕ show 1 reply

petesergeant • 12/09/2024

This isn’t a problem in practice. Most of my prompts ask the LLM to do a bunch of chain of thought before asking them to spit out JSON. I extract the JSON, which works 97.5% of the time, and have a retry step being real specific about “here’s the conversation so far but I need JSON now” that handles the rest. Adding examples really helps.

➕ show 1 reply

ActionHank • 12/09/2024

I also firmly believe that number of tokens served is a metric that is tracked and encouraged to go up, because more tokens mean more charges. o1 "does more" by using a whole lot more tokens for a very slight bump in usefulness.

➕ show 1 reply

phillipcarter • 12/09/2024

I've not had that experience when I include in the prompt for a coding LLM "only respond with the code".

Though it's worth noting that I often do want an explanation, and currently my workflow is to just use a different LLM.

➕ show 1 reply

Kuinox • 12/09/2024

Are you not using instruct tuned models ?

➕ show 1 reply

alt Hacker News

Replies