In which situation did a LLM save one million lives? Or worse, was able to but failed to do so?

zorked • yesterday at 4:51 AM • 1 reply • view on HN

Replies

The concern discussed is that some language models have reportedly claimed that misgendering is the worst thing anyone could do, even worse than something as catastrophic as thermonuclear war.

I haven’t seen solid evidence of a model making that exact claim, but the idea is understandable if you consider how LLMs are trained and recall examples like the “seahorse emoji” issue. When a topic is new or not widely discussed in the training data, the model has limited context to form balanced associations. If the only substantial discourse it does see is disproportionately intense—such as highly vocal social media posts or exaggerated, sarcastic replies on platforms like Reddit—then the model may overindex on those extreme statements. As a result, it might generate responses that mirror the most dramatic claims it encountered, such as portraying misgendering as “the worst thing ever.”

For clarity, I’m not suggesting that deliberate misgendering is acceptable, it isn’t. The point is simply that skewed or limited training data can cause language models to adopt exaggerated positions when the available examples are themselves extreme.

➕ show 4 replies

alt Hacker News

Replies