logoalt Hacker News

pjc50yesterday at 1:29 PM1 replyview on HN

If someone's going to ask you gotcha questions which they're then going to post on social media to use against you, or against other people, it helps to have pre-prepared statements to defuse that.

The model may not be able to detect bad faith questions, but the operators can.


Replies

pmichaudyesterday at 1:56 PM

I think the concern is that if the system is susceptible to this sort of manipulation, then when it’s inevitably put in charge of life critical systems it will hurt people.

show 2 replies