Is there a reason we can't use thinking completions to train non-thinking? i.e. gradient descen...

DoctorOetker • today at 5:42 PM • 1 reply • view on HN

Is there a reason we can't use thinking completions to train non-thinking? i.e. gradient descent towards what thinking would have answered?

Replies

joshred • today at 6:08 PM

From what I've read, that's already part of their training. They are scored based on each step of their reasoning and not just their solution. I don't know if it's still the case, but for the early reasoning models, the "reasoning" output was more of a GUI feature to entertain the user than an actual explanation of the steps being followed.

alt Hacker News

Replies