logoalt Hacker News

shay_keryesterday at 1:47 PM4 repliesview on HN

A general question - how do frontier AI companies handle scenarios like this in their training data? If they train their models naively, then training data injection seems very possible and could make models silently pwn people.

Do the labs label code versions with an associated CVE to label them as compromised (telling the model what NOT to do)? Do they do adversarial RL environments to teach what's good/bad? I'm very curious since it's inevitable some pwned code ends up as training data no matter what.


Replies

tomaskafkayesterday at 1:55 PM

Everyone’s (well, except Anthropic, they seem to have preserved a bit of taste) approach is the more data the better, so the databases of stolen content (erm, models) are memorizing crap.

datadrivenangelyesterday at 2:04 PM

This was a compromise of the library owners github acccounts apparently, so this is not a related scenario to dangerous code in the training data.

I assume most labs don't do anything to deal with this, and just hope that it gets trained out because better code should be better rewarded in theory?

Havocyesterday at 6:09 PM

By betting that it dilutes away and not worrying about it too much. Bit like dropping radioactive barrels into the deep ocean.

show 1 reply
Imustaskforhelpyesterday at 1:54 PM

I am pretty sure that such measures aren't taken by AI companies, though I may be wrong.

show 1 reply