> This is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT.
This is a noteworthy achievement.
Excuse my ignorance. What does SFT refer to here?
Excuse my ignorance. What does SFT refer to here?