Thinking vs non-thinking. There'll be a token cost there. But still fairly remarkable!
Is there a reason we can't use thinking completions to train non-thinking? i.e. gradient descent towards what thinking would have answered?
Is there a reason we can't use thinking completions to train non-thinking? i.e. gradient descent towards what thinking would have answered?