logoalt Hacker News

puttycatlast Sunday at 6:32 AM3 repliesview on HN

> OpenAI has figured out RL. the models no longer speak english

What does this mean?


Replies

orbital-decaylast Sunday at 7:08 AM

The model learns to reason on its own. If you only reward correct results but not readable reasoning, it will find its own way to reason that is not necessarily readable by a human. The chain may look like English, but the meaning of those words might be completely different (or even the opposite) for the model. Or it might look like a mix of languages, or just some gibberish - for you, but not for the model. Many models write one thing in the reasoning chain and a completely different in the reply.

That's the nature of reinforcement learning and any evolutionary processes. That's why the chain of thought in reasoning models is much less useful for debugging than it seems, even if the chain was guided by the reward model or finetuning.

Hard_Spacelast Sunday at 6:50 AM

Interesting. This happens in Colossus: The Forbin Project (1970), where the rogue AI escapes the semantic drudgery of English and invents its own compressed language with which to talk to its Russian counterpart.

show 1 reply
tehnublast Sunday at 7:45 PM

I think foremost it's a reference to this tweet https://x.com/karpathy/status/1835561952258723930.