logoalt Hacker News

andoandoyesterday at 6:52 PM0 repliesview on HN

The thinking models are additionally trained with reinforcement learning to produce chain of thought reasoning