logoalt Hacker News

iterateoftentoday at 4:08 AM5 repliesview on HN

This is funny because it’s a silly topic, but I think it shows something extremely seriously wrong with llms.

The goblins stand out because it’s obvious. Think of all the other crazy biases latent in every interaction that we don’t notice because it’s not as obvious.

Absolutely terrifying that OpenAI is just tossing around that such subtle training biases were hard enough to contain it had to be added to system prompt.


Replies

ninjagootoday at 4:13 AM

> Absolutely terrifying that OpenAI is just tossing around that such subtle training biases were hard enough to contain it had to be added to system prompt.

May I introduce you to homo sapiens, a species so vulnerable to such subtle (or otherwise) biases (and affiliations) that they had to develop elaborate and documented justice systems to contain the fallouts? :)

show 2 replies
snakebiteagaintoday at 5:01 AM

Mandatory reading on that topic: www.anthropic.com/research/small-samples-poison

We're probably not noticing a LOT of malicious attempts at poisoning major AI's only because we don't know what keywords to ask (but the scammers do and will abuse it).

tptacektoday at 4:22 AM

I think it's extraordinarily telling that people are capable of being reflexively pessimistic in response to the goblin plague. It's like something Zitron would do.

This story is wonderful.

show 1 reply
bitexplodertoday at 4:36 AM

We do not have the complete picture.

ordinarilytoday at 4:14 AM

Doesn't seem that surprising or terrifying to me. Humans come equipped with a lot more internal biases (learned in a fairly similar fashion), and they're usually a lot more resistant to getting rid of them.

The truly terrifying stuff never makes it out of the RLHF NDAs.

show 2 replies