logoalt Hacker News

0xbadcafebeetoday at 2:38 AM1 replyview on HN

OpenAI already did this when it released its "super scary advanced" security model. They silently return an earlier model's results if they think you're redteaming/abusing with it. https://openai.com/index/scaling-trusted-access-for-cyber-de...


Replies

llelouchtoday at 6:17 AM

They din't get as much pushback because they aren't the leader.