RL is simply a broad category of training methods. It's not really an architecture per se: mode...

hexaga • today at 1:11 AM • 0 replies • view on HN

RL is simply a broad category of training methods. It's not really an architecture per se: modern GPTs are trained first on reconstruction objective on massive text corpora (the 'large language' part), then on various RL objectives +/- more post-training depending on which lab.

alt Hacker News