logoalt Hacker News

AntiUSAbahyesterday at 6:07 PM1 replyview on HN

In contrary: In an Interview someone from OpenAI said they are trying to avoid it because it makes it harder for them to determine if a model gets better or not.


Replies

theszyesterday at 10:09 PM

Perturbation of dataset used for training can introduce adversarial behavior even without adding any other data, and idea is quite simple: you take two batches from the dataset for training and select model with more probable adversarial behavior. The more batches with posterior selection get processed, the more probable adversarial behavior become.

By determining if model gets better or not on a given benchmark, OpenAI selects models against benchmarks, implicitly using them in the training.