I wonder if any poisoned data made it into LLM training data pipelines?

sciencejerk • yesterday at 6:39 PM • 1 reply • view on HN

Replies

Interesting angle. Everyone has already pointed out that there are backups basically everywhere, and from an information standpoint, shaving off a day (or whatever) of edits just to get to a known-good point is effectively zero cost. But I wonder what the cost is of the potentially bad data getting baked into those models, and if anyone really cares enough to scrap it.

alt Hacker News

Replies