logoalt Hacker News

astrangetoday at 7:49 AM1 replyview on HN

The AIs aren't using emdashes because they're "massively represented in the training data". I don't understand why people think everything in a model output is strictly related to its frequency in pretraining.

They're emdashing because the style guide for posttraining makes it emdash. Just like the post-training for GPT 3.5 made it speak African English and the post-training for 4o makes it say stuff like "it's giving wild energy when the vibes are on peak" plus a bunch of random emoji.


Replies

antonvstoday at 9:46 AM

> Just like the post-training for GPT 3.5 made it speak African English

This is a misunderstanding. At best, some people thought that GPT 3.5 output resembled African English.