The eating disorder section is kind of crazy. Are we going to incrementally add sections for every 'bad' human behaviour as time goes on?
The alignment favors supporting healthy behaviors so it can be a thin line. I see the system prompt as "plan B" when they can't achieve good results in the training itself.
It's a particularly sensitive issue so they are just probably being cautious.
When you are worth hundreds of billions, people start falling over themselves running to file lawsuits against you. We're already seeing this happen.
So spending $50M to fund a team to weed out "food for crazies" becomes a no-brainer.
I mean, that's what humans have always done with our morals, ethics, and laws, so what alternative improvement do you have to make here?
Imagine the kind of human that never adapts their moral standpoints. Ever. They believe what they believed when they were 12 years old.
Letting the system improve over time is fine. System prompt is an inefficient place to do it, buts it's just a patch until the model can be updated.
Even better, adding it to the system prompt is a temporary fix, then they'll work it into post-training, so next model release will probably remove it from the system prompt. At least when it's in the system prompt we get some visibility into what's being censored, once it's in the model it'll be a lot harder to understand why "How many calories does 100g of Pasta have?" only returns "Sorry, I cannot divulge that information".