logoalt Hacker News

sciencejerkyesterday at 6:39 PM1 replyview on HN

I wonder if any poisoned data made it into LLM training data pipelines?


Replies

ibejoebyesterday at 6:51 PM

Interesting angle. Everyone has already pointed out that there are backups basically everywhere, and from an information standpoint, shaving off a day (or whatever) of edits just to get to a known-good point is effectively zero cost. But I wonder what the cost is of the potentially bad data getting baked into those models, and if anyone really cares enough to scrap it.