FYI: Newer LLM hosting APIs offer control over amount of "thinking" (as well as length of reply) -- some by token count others by an enum (high low, medium, etc.).