more or less this, but also synthetic
if you think about GANs, it's all the same concept
1. train model (agent)
2. train another model (agent) to do something interesting with/to the main model
3. gain new capabilities
4. iterate
You can use a mix of both real and synthetic chat sessions or whatever you want your model to be good at. Mid/late training seems to be where you start crafting personality and expertises.
Getting into the guts of agentic systems has me believing we have quite a bit of runway for iteration here, especially as we move beyond single model / LLM training. I still need to get into what all is de jour in the RL / late training, that's where a lot of opportunity lies from my understanding so far
Nathan Lambert (https://bsky.app/profile/natolambert.bsky.social) from Ai2 (https://allenai.org/) & RLHF Book (https://rlhfbook.com/) has a really great video out yesterday about the experience training Olmo 3 Think