>The “answer before reasoning” is a good evidence for it. It misses the most fundamental concept ...

miguel_martin • today at 2:48 AM • 1 reply • view on HN

>The “answer before reasoning” is a good evidence for it. It misses the most fundamental concept of tranaformers: the are autoregressive.

I don't think it's fair to assume the author doesn't understand how transformers work. Their intention with this instruction appears to aggressively reduce output token cost.

i.e. I read this instruction as a hack to emulate the Qwen model series's /nothink token instruction

If you're goal is quality outputs, then it is likely too extreme, but there are otherwise useful instructions in this repo to (quantifiably) reduce verbosity.

Replies

motoboi • today at 3:43 AM

If they want to reduce token cost, just use a smaller model instead of dumbing down a more expensive.

alt Hacker News

Replies