The entire history of RL-trained "reasoning models" from o1 to DeepSeek_R1 is basically ju...

HarHarVeryFunny • yesterday at 3:54 AM • 0 replies • view on HN

The entire history of RL-trained "reasoning models" from o1 to DeepSeek_R1 is basically just a year old!

alt Hacker News