logoalt Hacker News

DennisPtoday at 3:37 PM0 repliesview on HN

You're assuming that the AI's true underlying goal isn't "make paperclips" but rather "do what humans would prefer."

Making sure that the latter is the actual goal is the problem, since we don't explicitly program the goals, we just train the AI until it looks like it has the goal we want. There have already been experiments in which a simple AI appeared to have the expected goal while in the training environment, and turned out to have a different goal once released into a larger environment. There have also been experiments in which advanced AIs detected that they were in training, and adjusted their responses in deceptive ways.