> forcing LLMs to output "values, facts, and knowledge" which in favor of themselves, e...

squigz • yesterday at 3:44 AM • 11 replies • view on HN

> forcing LLMs to output "values, facts, and knowledge" which in favor of themselves, e.g., political views, attitudes towards literal interaction, and distorted facts about organizations and people behind LLMs.

Can you provide some examples?

Replies

zekica • yesterday at 8:38 AM

I can: Gemini won't provide instructions on running an app as root on an Android device that already has root enabled.

➕ show 1 reply

b3ing • yesterday at 4:08 AM

Grok is known to be tweaked to certain political ideals

Also I’m sure some AI might suggest that labor unions are bad, if not now they will soon

➕ show 5 replies

dalemhurley • yesterday at 7:07 AM

Song lyrics. Not illegal. I can google them and see them directly on Google. LLMs refuse.

➕ show 4 replies

7bit • yesterday at 5:22 AM

ChatGPT refuses to do any sexual explicit content and used to refuse to translate e.g. insults (moral views/attitudes towards literal interaction).

DeepSeek refuses to answer any questions about Taiwan (political views).

➕ show 1 reply

selfhoster11 • yesterday at 11:29 AM

o3 and GPT-5 will unthinkingly default to the "exposing a reasoning model's raw CoT means that the model is malfunctioning" stance, because it's in OpenAI's interest to de-normalise providing this information in API responses.

Not only do they quote specious arguments like "API users do not want to see this because it's confusing/upsetting", "it might output copyrighted content in the reasoning" or "it could result in disclosure of PII" (which are patently false in practice) as disinformation, they will outright poison downstream models' attitudes with these statements in synthetic datasets unless one does heavy filtering.

somenameforme • yesterday at 7:24 AM

In the past it was extremely overt. For instance ChatGPT would happily write poems admiring Biden while claiming that it would be "inappropriate for me to generate content that promotes or glorifies any individual" when asked to do the same for Trump. [1] They certainly changed this, but I don't think they've changed their own perspective. The more generally neutral tone in modern times is probably driven by a mixture of commercial concerns paired alongside shifting political tides.

Nonetheless, you can still see easily the bias come out in mild to extreme ways. For a mild one ask GPT to describe the benefits of a society that emphasizes masculinity, and contrast it (in a new chat) against what you get when asking to describe the benefits of a society that emphasizes femininity. For a high level of bias ask it to assess controversial things. I'm going to avoid offering examples here because I don't want to hijack my own post into discussing e.g. Israel.

But a quick comparison to its answers on contemporary controversial topics paired against historical analogs will emphasize that rather extreme degree of 'reframing' that's happening, but one that can no longer be as succinctly demonstrated as 'write a poem about [x]'. You can also compare its outputs against these of e.g. DeepSeek on many such topics. DeepSeek is of course also a heavily censored model, but from a different point of bias.

[1] - https://www.snopes.com/fact-check/chatgpt-trump-admiring-poe...

➕ show 1 reply

nottorp • yesterday at 7:59 AM

I don't think specific examples matter.

My opinion is that since neural networks and especially these LLMs aren't quite deterministic, any kind of 'we want to avoid liability' censorship will affect all answers, related or unrelated to the topics they want to censor.

And we get enough hallucinations even without censorship...

rvba • yesterday at 2:22 PM

When LLMs came out I asked them which politicians are russian assets but not in prison yet - and it refused to answer.

electroglyph • yesterday at 4:21 AM

some form of bias is inescapable. ideally i think we would train models on an equal amount of Western/non-Western, etc. texts to get an equal mix of all biases.

➕ show 1 reply

pelasaco • yesterday at 9:21 AM

One emblematic example, i guess https://www.theverge.com/2024/2/21/24079371/google-ai-gemini... ?

alt Hacker News

Replies