logoalt Hacker News

ekelsenyesterday at 10:13 PM1 replyview on HN

I wouldn't be surprised if humans behaved the same way when playing the same game?

Like even if you brought me into a room and told me I was controlling "real nuclear weapons" I wouldn't believe you.


Replies

Levitatingyesterday at 10:33 PM

I think is an important point, and I don't see it mentioned in the article or the paper (though I skimmed the latter).

They are aware of what they are and how they are used. They're told to act as AI assistants. And there's theories of them being aware of their answers influencing their training.

So surely they must be able to reason that they're not literally controlling weapons of mass-destruction with their answers.