logoalt Hacker News

andaitoday at 12:23 AM0 repliesview on HN

I regularly test every available AI, maybe once a month or so. I will send them the same question, usually about a new subject I am learning.

Oddly, Chinese models seem the most natural to me. Every random Chinese model does better than ChatGPT, on the "natural language" front. (And Grok also scores high on awkward language use. I don't know what causes that -- something about mode collapse? They have these words they obsess over... I mean, just try asking an AI for 10 random words ;)

I can sometimes see "ChatGPT-isms" in other models, but they're more subtle, and it feels like they're "woven" into the flow of the text.

Whereas even when I ask GPT to respond in prose or conversation, it'll give me a thinly veiled "ChatGPT response", if it can even resist the urge start spamming headings, bullet points and numbered lists.

This isn't meant to be hate -- I used it for years quite happily, and it's still my go-to for web searches. But coming back to it now, the language is surprisingly offputting. I don't know if it got worse, or if I just stopped being used to it.

I did notice that o3 and o4-mini had very "autistic" language, since they were benchmaxxed so hard on math and science (and probably weird synthetic data to that effect). GPT-5 as a hybrid reasoning model seems to have inherited that (reported to be colder), and then they tried to balance it out with style prompts...

I honestly think it might make more sense to just have two LLMs. Ultra concise technical reasoning model, and then a 2nd layer to translate it for the human. Because right now kind of feels like the worst of both worlds, a compromise that satisfies neither side.

Gemini 2.5 Pro's reasoning traces (before they nerfed them) were a good example. The deep technical analysis, and then the human-friendly version in the final output. But I found their reasoning more readable than the final output!