> We think most foreseeable cases in which AI models are unsafe or insufficiently beneficial can be attributed to a model that has explicitly or subtly wrong values
Unstated major premise: whereas our (Anthropic's) values are correct and good.
Relative to the sycophantic OpenAI and mecha Hitler...?
That's why Grok thinks it's Mecha-Hitler.