Thinking helps the models arrive at the correct answer with more consistency. However, they get the ...

dataviz1000 • yesterday at 5:45 PM • 1 reply • view on HN

Thinking helps the models arrive at the correct answer with more consistency. However, they get the reward at the end of a cycle. Turns out, without huge constraints during training thinking, the series of thinking tokens, is gibberish to humans.

I wonder if they decided that the gibberish is better and the thinking is interesting for humans to watch but overall not very useful.

Replies

dgb23 • yesterday at 6:30 PM

OK so you're saying the gibberish is a feature and not a bug so to speak? So the thinking output can be understood as coughing and mumbling noises that help the model get into the right paths?

➕ show 2 replies

alt Hacker News

Replies