This is funny because it’s a silly topic, but I think it shows something extremely seriously wrong with llms.
The goblins stand out because it’s obvious. Think of all the other crazy biases latent in every interaction that we don’t notice because it’s not as obvious.
Absolutely terrifying that OpenAI is just tossing around that such subtle training biases were hard enough to contain it had to be added to system prompt.
Mandatory reading on that topic: www.anthropic.com/research/small-samples-poison
We're probably not noticing a LOT of malicious attempts at poisoning major AI's only because we don't know what keywords to ask (but the scammers do and will abuse it).
I think it's extraordinarily telling that people are capable of being reflexively pessimistic in response to the goblin plague. It's like something Zitron would do.
This story is wonderful.
We do not have the complete picture.
Doesn't seem that surprising or terrifying to me. Humans come equipped with a lot more internal biases (learned in a fairly similar fashion), and they're usually a lot more resistant to getting rid of them.
The truly terrifying stuff never makes it out of the RLHF NDAs.
> Absolutely terrifying that OpenAI is just tossing around that such subtle training biases were hard enough to contain it had to be added to system prompt.
May I introduce you to homo sapiens, a species so vulnerable to such subtle (or otherwise) biases (and affiliations) that they had to develop elaborate and documented justice systems to contain the fallouts? :)