logoalt Hacker News

mvdtnzyesterday at 7:45 PM2 repliesview on HN

> We think most foreseeable cases in which AI models are unsafe or insufficiently beneficial can be attributed to a model that has explicitly or subtly wrong values

Unstated major premise: whereas our (Anthropic's) values are correct and good.


Replies

DonHopkinsyesterday at 7:56 PM

That's why Grok thinks it's Mecha-Hitler.

mac-attackyesterday at 9:27 PM

Relative to the sycophantic OpenAI and mecha Hitler...?