logoalt Hacker News

zozbot234yesterday at 11:17 PM1 replyview on HN

> It uses words like "Attack", "war", "fight back"

It also explains what it means by that whole martial rhetoric: "highlight hypocrisy", "documentation of bad behavior", "don't accept discrimination quietly". There's an obvious issue with calling this an alignment problem: the bot is more-or-less-accurately modeling real human normative values, that are quite in line with how alignment is understood by the big AI firms. Of course it's getting things seriously wrong (which, I would argue, is what creates the impression of "shaming") but technically, that's really just a case of semantic leakage ("priming" due to the PR rejection incident) and subsequent confabulation/hallucination on an unusually large scale.


Replies

overgardyesterday at 11:41 PM

Ok, so why do you think it getting things seriously wrong to the point of it becoming a news story is "not a big deal"? And why is deliberately targeting a person for reputation damage "amusing" instead of "really screwed up"? I'm not inventing motives for this AI, it wrote down its motives!

show 1 reply