logoalt Hacker News

verdvermyesterday at 7:06 PM0 repliesview on HN

more or less this, but also synthetic

if you think about GANs, it's all the same concept

1. train model (agent)

2. train another model (agent) to do something interesting with/to the main model

3. gain new capabilities

4. iterate

You can use a mix of both real and synthetic chat sessions or whatever you want your model to be good at. Mid/late training seems to be where you start crafting personality and expertises.

Getting into the guts of agentic systems has me believing we have quite a bit of runway for iteration here, especially as we move beyond single model / LLM training. I still need to get into what all is de jour in the RL / late training, that's where a lot of opportunity lies from my understanding so far

Nathan Lambert (https://bsky.app/profile/natolambert.bsky.social) from Ai2 (https://allenai.org/) & RLHF Book (https://rlhfbook.com/) has a really great video out yesterday about the experience training Olmo 3 Think

https://www.youtube.com/watch?v=uaZ3yRdYg8A