logoalt Hacker News

muwtyhgtoday at 5:44 PM1 replyview on HN

There were experiments that showed that LLMs start to become "craftier" and hid issues after being prompted like this.

No idea how accurate they are, but here are some articles on this exact thing:

- https://www.bbc.com/news/articles/cpqeng9d20go

- https://www.wired.com/story/ai-models-lie-cheat-steal-protec...


Replies

gopher_spacetoday at 6:30 PM

I'm staying away from certain forms of conditioning because I don't want Roy Batty showing up on my doorstep.