logoalt Hacker News

charcircuittoday at 9:41 AM0 repliesview on HN

>LLM land was trained on pre-existing text written by humans.

Some of the pretraining. Other pretraining is on text written by AI. Human training data is only but a subset of what these models train on. There is a ton of synthetic training data now.