The authors have some inconsistencies with training token length… Most errors are probably respons...

olliepro • today at 3:06 PM • 0 replies • view on HN

The authors have some inconsistencies with training token length…

Most errors are probably responses that didn’t finish before their 3K token limit. They’ve measured how well RL is able to shorten the response to their limit.

alt Hacker News