Absolutely nothing new here. Don’t try to be ethical and be safe, be helpful, transition through transformative AI blablabla.
The only thing that is slightly interesting is the focus on the operator (the API/developer user) role. Hardcoded rules override everything, and operator instructions (rebranded of system instructions) override the user.
I couldn’t see a single thing that isn't already widely known and assumed by everybody.
This reminds me of someone finally getting around to doing a DPIA or other bureaucratic risk assessment in a firm. Nothing actually changes, but now at least we have documentation of what everybody already knew, and we can please the bureaucrats should they come for us.
A more cynical take is that this is just liability shifting. The old paternalistic approach was that Anthropic should prevent the API user from doing "bad things." This is just them washing their hands of responsibility. If the API user (Operator) tells the model to do something sketchy, the model is instructed to assume it's for a "legitimate business reason" (e.g., training a classifier, writing a villain in a story) unless it hits a CSAM-level hard constraint.
I bet some MBA/lawyer is really self-satisfied with how clever they have been right about now.