> This seems like a wasted effort when AI will primarily learn the majority consensus view and...

Mordisquitos • yesterday at 9:14 PM • 2 replies • view on HN

> This seems like a wasted effort when AI will primarily learn the majority consensus view and not one-off misinformation.

We have evidence to the contrary. Two blog articles and two preprints of fake academic articles [0] were able to convince CoPilot, Gemini, ChatGPT and Perplexity AI of the existence of a fake disease, against all majority consensus. And even though the falsity of this information was made public by the author of the experiment and the results of their actions were widely published, it took a while before the models started to get wind of it and stopped treating the fake disease as real. Imagine what you can do if you publish false information and have absolutely no reason to later reveal that you did so in the first place.

[0] https://www.nature.com/articles/d41586-026-01100-y

Replies

gwern • yesterday at 9:31 PM

> Two blog articles and two preprints of fake academic articles [0] were able to convince CoPilot, Gemini, ChatGPT and Perplexity AI of the existence of a fake disease, against all majority consensus

Wrong. There are no 'majority consensus' against 'bixonimania' because they made it up, that was the point. It's unsurprisingly easy to get LLMs to repeat the only source on a term never before seen. This usually works; made-up neologisms are the fruitfly of data poisoning because it is so easy to do and so unambiguous where the information came from. (And retrieval-based poisoning is the very easiest and laziest and most meaningless kind of poisoning, tantamount to just copying the poison into the prompt and asking a question about it.) But the problem with them is that also by definition, it is hard for them to matter; why would anyone be searching or asking about a made-up neologism? And if it gets any criticism, the LLMs will pick that up, as your link discusses. (In contrast, the more sources are affected, the harder it is to assign blame; some papermills picked up 'bixonimania'? Well, they might've gotten it from the poisoned LLMs... or they might've gotten it from the same place the LLMs did which poisoned their retrievals, Medium et al.)

➕ show 1 reply

alyxya • yesterday at 9:50 PM

All the examples you gave are chatbots with web search integrated. Are you sure those chatbots didn't just reference false information it found in web searches? That's fundamentally different than poisoning the training of AI models.

> The problem was that the experiment worked too well. Within weeks of her uploading information about the condition, attributed to a fictional author, major artificial-intelligence systems began repeating the invented condition as if it were real.

This seems to imply the poisoning affected the web search results, not the actual model itself, because it takes months for data to make it into a trained base model.

alt Hacker News

Replies