logoalt Hacker News

andy99today at 8:32 AM1 replyview on HN

LLM writing style is trained in by data labellers, it’s not just emergent behavior from being trained on internet texts.


Replies

embedding-shapetoday at 8:40 AM

Ultimately it's a mix-match of everything, including whatever data the pre-training uses and how exactly they do the post-training. I don't think you can say there is a single factor that decides the writing style, unless you have some particular insight into some specific pipeline. Generally though, they output text that looks like the human text they ingested for training.