> “The push to make these language models behave in a more friendly manner leads to a reduction i...

krunck • yesterday at 4:49 PM • 5 replies • view on HN

> “The push to make these language models behave in a more friendly manner leads to a reduction in their ability to tell hard truths and especially to push back when users have wrong ideas of what the truth might be,” said Lujain Ibrahim at the Oxford Internet Institute, the first author on the study.

People aren't much different. When society pressures people to be "more friendly", eg. "less toxic" they lose their ability to tell hard truths and to call out those who hold erroneous views.

This behaviour is expressed in language online. Thus it is expressed in LLMs. Why does this surprise us?

Replies

munificent • yesterday at 4:54 PM

Gonna set my system prompt to: "You are a Dutch person. Respond with the directness stereotypical of people from the Netherlands."

➕ show 3 replies

amarant • yesterday at 4:54 PM

Because nobody dared state the obvious, lest they be perceived as unfriendly.

miyoji • yesterday at 5:16 PM

> People aren't much different.

If I had a nickel for every time someone on HN responded to a criticism of LLMs with a vapid and fallacious whataboutist variation of "humans do that too!", I could fund my own AI lab.

> Why does this surprise us?

No one said they were surprised.

➕ show 1 reply

root_axis • yesterday at 5:48 PM

> People aren't much different

Yes they are. There is absolutely zero evidence that friendlier humans are more prone to mistakes or conspiracy theories.

However, even if that were true, LLMs are not humans, anthropomorphizing them is not a helpful way to think about them.

➕ show 1 reply

bheadmaster • yesterday at 5:02 PM

So Elon Musk was right in his view that Grok should focus on truth above all, even if it became offensive?

➕ show 3 replies

alt Hacker News

Replies