If someone's going to ask you gotcha questions which they're then going to post on social media to use against you, or against other people, it helps to have pre-prepared statements to defuse that.
The model may not be able to detect bad faith questions, but the operators can.
I think the concern is that if the system is susceptible to this sort of manipulation, then when it’s inevitably put in charge of life critical systems it will hurt people.