Yeah. I usually do this by telling it to be adversarial and find gaps and holes. Not fool proof but it does seem to increase the quality. It has helped when using local models in particular.
Yeah, you have to shortcut the RL-trained people pleasing
Yeah, you have to shortcut the RL-trained people pleasing