> RLVR is weirder, and I suspect it's why we see "It's not X, it's Y" so...

Retr0id • yesterday at 10:45 PM • 1 reply • view on HN

> RLVR is weirder, and I suspect it's why we see "It's not X, it's Y" so often.

This feels like an easy enough hypothesis to verify, for anyone in the business of training LLMs - does the not-X-but-Y rate increase after RLVR?

Replies

andy99 • yesterday at 11:01 PM

It’s unlikely this is true. LLMs are way more mad-libs / templates than we like to admit, that’s (ironically) not a judgement about their capability, it’s primarily just an observation. But it’s also what plain old SFT, which I believe is the primary culprit, ends up imparting.

alt Hacker News

Replies