logoalt Hacker News

ollieprotoday at 3:06 PM0 repliesview on HN

The authors have some inconsistencies with training token length…

Most errors are probably responses that didn’t finish before their 3K token limit. They’ve measured how well RL is able to shorten the response to their limit.