logoalt Hacker News

jdubtoday at 12:21 PM1 replyview on HN

Reinforcement learning for "reasoning" perturbs the model to generate completions in a particular chain of thought / alternative selection structure. It's three next token predictors in a trench coat.


Replies

charleshntoday at 1:37 PM

> Some people like to parrot "next token prediction", "LLMs can only interpolate", and other nonsense

Thank you for illustrating my point.