They have a lot of data in the form: user input, LLM output. Then the model learns what the previous...

nairboon • yesterday at 11:40 AM • 1 reply • view on HN

They have a lot of data in the form: user input, LLM output. Then the model learns what the previous LLM models produced, with all their flaws. The core LLM premise is that it learns from all available human text.

Replies

__alexs • yesterday at 11:50 AM

This hasn't been the full story for years now. All SOTA models are strongly post-trained with reinforcement learning to improve performance on specific problems and interaction patterns.

The vast majority of this training data is generated synthetically.

alt Hacker News

Replies