> We think most foreseeable cases in which AI models are unsafe or insufficiently beneficial can ...

mvdtnz • yesterday at 7:45 PM • 2 replies • view on HN

> We think most foreseeable cases in which AI models are unsafe or insufficiently beneficial can be attributed to a model that has explicitly or subtly wrong values

Unstated major premise: whereas our (Anthropic's) values are correct and good.

Replies

DonHopkins • yesterday at 7:56 PM

That's why Grok thinks it's Mecha-Hitler.

mac-attack • yesterday at 9:27 PM

Relative to the sycophantic OpenAI and mecha Hitler...?

alt Hacker News

Replies