My pet theory is similar to the training set hypothesis: em-dashes appear often in prestige publications. The Atlantic, The New Yorker, The Economist, and a few others that are considered good writing. Being magazines, there's a lot of articles over time, reinforcing the style. They're also the sort of thing a RLHF person will think is good, not because of the em-dash but because the general style is polished.
One thing I wondered is whether high prestige writing is encoded into the models, but it doesn't seem far fetched that there's various linkages inside the data to say "this kind of thing should be weighted highly."
This is mine as well, with the addition of books. If someone wanted to train a bot to sound more human, they would select data that is verifiably human-made.
The approachable tone of popular print media also preselects for the casual, highly-readable style I suspect users would want from a bot.
That kind of fits with Altman saying they put them in because users liked them (https://www.linkedin.com/posts/curtwoodward_chatgpt-em-dash-...)
I guess in the past if you'd shown me a passage with em dashes I'd say it looks good because I associate it with the New Yorker and Economist, both of which I read. Now I'd be a bit more meh due to LLMs.
It also seems that LLMs are using them correctly — as a pause or replacement for a comma (yes, I know this is an imprecise description of when to use them).
Thanks to LLMs I learned that using the short binding dash everywhere is incorrect, and I can improve my writing because of it.