Reinforcement learning for "reasoning" perturbs the model to generate completions in a par...

jdub • today at 12:21 PM • 1 reply • view on HN

Reinforcement learning for "reasoning" perturbs the model to generate completions in a particular chain of thought / alternative selection structure. It's three next token predictors in a trench coat.

Replies

charleshn • today at 1:37 PM

> Some people like to parrot "next token prediction", "LLMs can only interpolate", and other nonsense

Thank you for illustrating my point.

alt Hacker News

Replies