logoalt Hacker News

cookiengineertoday at 4:12 PM2 repliesview on HN

I wanted to point out that em dashes are autocompleted by the iOS keyboard. So the false positives and true negatives might have some overlaps without more details. I think a better indicator would be to only detect em dashes with preceding and following whitespace characters, and general unicode usage of that user.

Additionally, lots of Chinese and Russian keyboard tools use the em dash as well, when they're switching to the alternative (en-US) layout overlay.

There's also the Chinese idiom symbol in UTF8 which gets used as a dot by those users a lot, so that could be a nice indicator for legit human users.

edit: lol @ downvotes. Must have hit a vulnerable spot, huh?


Replies

Aurornistoday at 4:17 PM

> I wanted to point out that em dashes are autocompleted by the iOS keyboard.

That’s why the analysis was performed over time. All of those em dash sources you mentioned were present before LLM written content became popular.

marginalia_nutoday at 4:17 PM

I think there is a baseline number of human users that for one reason or another uses em-dashes, but this doesn't explain why they 10x more prevalent in green accounts.

show 1 reply