logoalt Hacker News

wesbztoday at 12:41 AM0 repliesview on HN

AI researcher here. I literally did my PhD on data poisoning in an AI frontier lab and developed a new form of data poisoning against LLMs.

1. Yes, model developers filter data... but poorly. Several examples showed trash data can make the cut into production and break something on the way.

2. To be fair, filtering data poisons can be extremely challenging, even impossible. Simply because one cannot know how updating a model's weights influence its behaviour on all possible inputs.

Once people will understand that even a tiny amount of data can slightly change models and still greatly change their behaviour, there will be a shift in AI security.