logoalt Hacker News

catigulalast Friday at 4:10 PM3 repliesview on HN

They really lie.

Not on purpose; because they are trained on rewards that favor lying as a strategy.

Othello-GPT is a good example to understand this. Without explicit training, but on the task of 'predicting moves on an Othello board', Othello-GPT spontaneously developed the strategy of 'simulate the entire board internally'. Lying is a similar emergent, very effective strategy for reward.


Replies

lo_zamoyskilast Friday at 6:19 PM

> They really lie. Not on purpose

You can't lie by accident. You can tell a falsehood, however.

But where LLMs are concerned, they don't tell truths or falsehoods either, as "telling" also requires intent. Moreover, LLMs don't actually contain propositional content.

show 1 reply
nomellast Friday at 5:47 PM

Reference: https://www.science.org/content/article/ai-hallucinates-beca...

If you don't know the answer, and are only rewarded for correct answers, guessing, rather than saying "I don't know", is the optimal approach.

show 1 reply
Neywinylast Friday at 4:31 PM

Not sure if that counts as lying but I've heard that an ML model (way before all this GPT LLM stuff) learned to classify images based on the text that was written. For an obfuscated example, it learned to read "stop", "arrêt", "alto", etc. on a stop sign instead of recognizing the red octagon with white letters. Which naturally does not work when the actual dataset has different text.

show 2 replies