logoalt Hacker News

wistylast Friday at 11:39 PM0 repliesview on HN

This is the underlying problem behind syncophantcy.

I saw a YouTube video about a investigative youtuber Eddy Burback who very easily convinced chat4 that he should cut off all contact with friends and family, move to a cabin in the desert, eat baby food, wrap himself in alfoil, etc just feeding his own (faked) mistakes and delusions. "What you are doing is important, trust your instincts".

Wven if AI could hypothetically be 100x as smart as a human under the hood, it still doesn't care. It doesn't do what it thinks it should, it doesn't do what it needs to do, it does what we train it to.

We train in humanities weaknesses and follies. AI can hypothetically exceed humanity in some respects, but in other respects it is a very hard to control power tool.

AI is optimised, and optimised functions always "hack" the evaluation function. In the case of AI, the evaluation function includes human flaws. AI is trained to tell us what we want to hear.

Elon Musk sees the problem, but his solution is to try to make it think more like him, and even if that succeeds it just magnifies his own weaknesses.

Has anyone read the book criticising Ray Dalio? He is a very successful hedge fund manager, who decided that he could solve the problem of finding a replacement by psychology evaluation and training people to think like him. But even his smartest employees didn't think like him, they just (reading between the lines) gamed his system. Their incentives weren't his incentives - he could demand radical honesty and integrity but that doesn't work so well when he would (of course) reward the people who agreed with him, rather than the people who would tell him he was screwing up. His organisation (apparently) became a bunch of even more radical syncopants due to his efforts to weed out syncophantcy.