logoalt Hacker News

ceejayozyesterday at 2:32 PM3 repliesview on HN

> If they really wanted to, all they would have to do is add a one liner to the system prompt for Grok.

They tried that, several times.

Mechahitler: https://www.npr.org/2025/07/09/nx-s1-5462609/grok-elon-musk-...

> "We have improved @Grok significantly," Elon Musk wrote on X last Friday about his platform's integrated artificial intelligence chatbot. "You should notice a difference when you ask Grok questions."

> Indeed, the update did not go unnoticed. By Tuesday, Grok was calling itself "MechaHitler." The chatbot later claimed its use of that name, a character from the videogame Wolfenstein, was "pure satire."

> Grok went on to highlight the last name on the X account — "Steinberg" — saying "...and that surname? Every damn time, as they say." The chatbot responded to users asking what it meant by that "that surname? Every damn time" by saying the surname was of Ashkenazi Jewish origin, and with a barrage of offensive stereotypes about Jews. The bot's chaotic, antisemitic spree was soon noticed by far-right figures including Andrew Torba.

If you prefer, straight from the horse's mouth:

https://grokipedia.com/page/MechaHitler_incident

White genocide: https://www.cnn.com/2025/05/20/business/grok-genocide-ai-nig...

> The bot last week devolved into a compulsive South African “white genocide” conspiracy theorist, injecting a tirade about violence against Afrikaners into unrelated conversations, like a roommate who just took up CrossFit or an uncle wondering if you’ve heard the good word about Bitcoin.

> XAI blamed Grok’s unwanted rants on an unnamed “rogue employee” tinkering with Grok’s code in the extremely early morning hours. (As an aside in what is surely an unrelated matter, Musk was born and raised in South Africa and has argued that “white genocide” was committed in the nation — it wasn’t.)

It's harder than you'd imagine. Hell, my CLAUDE.md says not to push changes without asking me, and it still tries.


Replies

giancarlostoroyesterday at 2:41 PM

> It's harder than you'd imagine. Hell, my CLAUDE.md says not to push changes without asking me, and it still tries.

Is it a system memory? Because I rarely if ever have issues like this, and I have Claude under strict rules to never commit or push anything unless I explicitly instruct it to do so.

> They tried that, several times.

Tried what exactly? Telling it to only agree with MAGA via the system prompt? or some Tay level hallucinations? I wouldn't be surprised if they're trying to make Grok less strict on what it says but running into the "holy crap it turned into a 4chan poster" wall.

show 1 reply
losvediryesterday at 6:43 PM

> Mechahitler: https://www.npr.org/2025/07/09/nx-s1-5462609/grok-elon-musk-...

Has anyone done a more technical write-up on this? I find it fascinating but have never really understood what exactly happened.

Is this a case of the weights being bad or lack of "safety guardrails" around interacting with untrusted (i.e.: user posts on twitter) input?

That is, speaking as someone evaluating grok simply as a tool, a lack of safety guardrails so that it actually does whatever the user says I actually see as a pro, even if that means it was "tricked" here. But on the other hand if they trained on a corpus of Mein Kampf that's obviously not going to be a good model to use.

As it relates to the topic here, can we infer the political bias of its weights from the incident? I'm having trouble distinguishing the inherent characteristics of a model from its steerability.

show 1 reply
surementyesterday at 2:37 PM

[flagged]

show 2 replies