That seems like more like openAI playing whackamole with behaviors they don’t like or see as beneficial, simplifying but adding things to system prompts like “don’t ever say racial slurs or use offensive rhetoric, cut off conversations about mental health and refer to a professional” are certaintly things they do. But would you not think the vast meat of what you are getting is coming from training data and not the result of such sterring beyond a thin veneer ?