logoalt Hacker News

dataviz1000yesterday at 5:45 PM1 replyview on HN

Thinking helps the models arrive at the correct answer with more consistency. However, they get the reward at the end of a cycle. Turns out, without huge constraints during training thinking, the series of thinking tokens, is gibberish to humans.

I wonder if they decided that the gibberish is better and the thinking is interesting for humans to watch but overall not very useful.


Replies

dgb23yesterday at 6:30 PM

OK so you're saying the gibberish is a feature and not a bug so to speak? So the thinking output can be understood as coughing and mumbling noises that help the model get into the right paths?

show 2 replies