A general question - how do frontier AI companies handle scenarios like this in their training data?...

shay_ker • yesterday at 1:47 PM • 4 replies • view on HN

A general question - how do frontier AI companies handle scenarios like this in their training data? If they train their models naively, then training data injection seems very possible and could make models silently pwn people.

Do the labs label code versions with an associated CVE to label them as compromised (telling the model what NOT to do)? Do they do adversarial RL environments to teach what's good/bad? I'm very curious since it's inevitable some pwned code ends up as training data no matter what.

Replies

tomaskafka • yesterday at 1:55 PM

Everyone’s (well, except Anthropic, they seem to have preserved a bit of taste) approach is the more data the better, so the databases of stolen content (erm, models) are memorizing crap.

datadrivenangel • yesterday at 2:04 PM

This was a compromise of the library owners github acccounts apparently, so this is not a related scenario to dangerous code in the training data.

I assume most labs don't do anything to deal with this, and just hope that it gets trained out because better code should be better rewarded in theory?

Havoc • yesterday at 6:09 PM

By betting that it dilutes away and not worrying about it too much. Bit like dropping radioactive barrels into the deep ocean.

➕ show 1 reply

Imustaskforhelp • yesterday at 1:54 PM

I am pretty sure that such measures aren't taken by AI companies, though I may be wrong.

➕ show 1 reply

alt Hacker News

Replies