logoalt Hacker News

fragmedeyesterday at 9:39 PM1 replyview on HN

This paper may be of interest to you: https://arxiv.org/html/2504.15867v1


Replies

nagaiaidayesterday at 10:03 PM

the mechanism of action for that attack appears to be reading from poisoned snippets on stackoverflow or a similar site, which to my mind is an excellent example of why it seems like it would be difficult to retroactively pin "insecure code came out of my model" on the evil communist base weights of the model in question