Take a look at the harmony repo which specifies the internal OpenAI format - the effort level is spe...

pyentropy • last Sunday at 3:48 PM • 1 reply • view on HN

Take a look at the harmony repo which specifies the internal OpenAI format - the effort level is specified in the context after the <|start|> tag - https://github.com/openai/harmony

Note that inference libs also have parsers that put hard limits on reasoning tokens with separate counters (similar to how you can put a limit on token generation per completion versus waiting for an <eos>). For that, take a look at vllm reasoning docs.

Replies

pyentropy • last Sunday at 9:09 PM

Examples with inference of different reasoning effort levels is in the OpenAI docs as well - https://developers.openai.com/cookbook/articles/openai-harmo...

https://docs.vllm.ai/en/latest/features/reasoning_outputs/#a...

https://developers.openai.com/api/docs/guides/reasoning

➕ show 1 reply

alt Hacker News

Replies