logoalt Hacker News

vidarhyesterday at 9:39 PM5 repliesview on HN

> the ability to poison models, if it can be made to work reliably

Ultimately, it comes down to the halting problem: If there's a mechanism that can be used to alter the measured behaviour, then the system can change behaviour to take into account the mechanism.

In other words, unless you keep the poisoning attack strictly inaccessible to the public, the mechanism used to poison will also be possible to use to train models to be resistant to it, or train filters to filter out poisoned data.

At least unless the poisoning attack destroys information to a degree that it would render the poisoned system worthless to humans as well, in which case it'd be unusable.

So either such systems would be insignificant enough to matter, or they will only work for long enough to be noticed, incorporated into training, and fail.

I agree it's an interesting CS challenge, though, as it will certainly expose rough edges where the models and training processes works sufficiently different to humans to allow unobtrusive poisoning for a short while. Then it'll just help us refine and harden the training processes.


Replies

kibwenyesterday at 10:07 PM

> then the system can change behaviour to take into account the mechanism

The question is not whether the system can change, it's whether the system is incentivized to change. Poisoners could operate entirely in the public, and theoretically manage to successfully poison targeted topics, and it could cost the model developers more than it's worth to fix it. Think about obscure topics like, say, Dark Souls speedrunning. There is no business demand for making sure that a model can successfully give information relating to something like that, so poisoning, if it works, would probably not be addressed, because there's no reason for the model developers to care.

lepusyesterday at 10:06 PM

It's a very comparable game of cat and mouse to spam email filtering. People also tried to claim that spam was over because for a time companies like Google cared enough to invest a lot in preventing as much as possible from getting through. If you've noticed in recent years the motivation to keep up that level of filtering has greatly diminished.

Whether model poisoning becomes a bigger issue depends on the incentives for companies to keep fighting it. For now in comparison to attackers the incentives and resources needed to defend against model poisoning are huge so it's just temporary setbacks. Will that unevenness in their favor always be the case?

show 2 replies
tw061023today at 1:03 AM

That's the point of the challenge: "are there unknown properties of models allowing us to construct a poison for any network given enough input-output pairs".

The very point of CS as an academic discipline is _generalization_.

GTPyesterday at 9:55 PM

This reduction to the halting problem looks too handwawy to me. I don't see as a given that the possibility of the system taking into account the attack follows from the existence of the attack.

show 2 replies
Ar-Curunirtoday at 12:06 AM

> Ultimately, it comes down to the halting problem: If there's a mechanism that can be used to alter the measured behaviour, then the system can change behaviour to take into account the mechanism.

No, that’s the opposite of the halting problem…