logoalt Hacker News

measurablefunclast Monday at 11:48 PM3 repliesview on HN

There is no RL for programming languages. Especially ones w/ no significant amount of code.


Replies

nlyesterday at 4:53 AM

I guess the op was implying that is something fixable fairly easily?

(Which is true - it's easy to prompt your LLM with the language grammar, have it generate code and then RL on that)

Easy in the sense of "it is only having enough GPUs to RL a coding capable LLM" anyway.

show 1 reply
thorumyesterday at 7:59 AM

Go read the DeepSeek R1 paper

show 1 reply
whimsicalismlast Monday at 11:51 PM

not even wrong

show 1 reply