>The “answer before reasoning” is a good evidence for it. It misses the most fundamental concept of tranaformers: the are autoregressive.
I don't think it's fair to assume the author doesn't understand how transformers work. Their intention with this instruction appears to aggressively reduce output token cost.
i.e. I read this instruction as a hack to emulate the Qwen model series's /nothink token instruction
If you're goal is quality outputs, then it is likely too extreme, but there are otherwise useful instructions in this repo to (quantifiably) reduce verbosity.
If they want to reduce token cost, just use a smaller model instead of dumbing down a more expensive.