From the model card: "the safeguards will limit effectiveness through methods such as prompt mo...

mips_avatar • yesterday at 11:49 PM • 3 replies • view on HN

From the model card: "the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning" aka they will take your ML research code and inject bugs into it until it breaks using a LORA (or some other form of PEFT)

Replies

sciencejerk • today at 4:43 AM

Are they trying to fight back against model distillation?

bee_rider • today at 12:53 AM

“Limit effectiveness” could mean introducing performance degradation in your code. Which is arguably some sort of performance bug (I mean, ML codes are supposed to be high performance so I’d call unnecessary degradation a bug), but it could be borderline.

➕ show 1 reply

nomel • yesterday at 11:56 PM

Thanks, I thought maybe I missed something. That's an interesting way to interpret that.

➕ show 2 replies

alt Hacker News

Replies