logoalt Hacker News

orbital-decayyesterday at 7:06 PM1 replyview on HN

Regarding Anthropic, they used to make best multilingual and generalist models, it's their policy thing, not a capability issue. Claude 3 was best at this, including dead and low-resource languages. Neither modern Claude nor Gemini are remotely close to what Claude 3 was capable of (e.g. zero-shot writing styles). Anthropic basically reversed their "character training" policy and started optimizing their models for code generation at the cost of everything else, starting with Sonnet 3.5. Claude 4 took a huge hit in multilingual ability

GPT, on the other hand, was always terrible at languages, except for the short-lived gpt-4.5-preview.

All modern models including Gemini have bugs in basic language coherency - random language switching, self-correction attempts resulting in hallucinations etc. I speculate it's a problem with heavy RL with rewards and policies not optimized for creative writing.


Replies

awonghyesterday at 8:46 PM

The benchmarks don’t seem to say that language ability has gotten worse?

show 1 reply