Different models do slight variants. Usually it’s done in post training to enforce behavior based ...

aabdi • last Sunday at 2:33 PM • 0 replies • view on HN

Different models do slight variants.

Usually it’s done in post training to enforce behavior based on prompt. Ie. System prompt with thinking:max or low or wtv.

Enforcement then goes via constrained decoding, checking for think token start and end with max lengths, or other variations

alt Hacker News