logoalt Hacker News

mrobtoday at 7:54 AM1 replyview on HN

Anybody who assumes that superintelligence will be "so stupid that it literally pursues these goals to the extinction of everything" is anthropomorphizing it. Seeing as all AGI models have vastly different internal structure to human brains, are trained in vastly different ways, and share none of our evolved motivations, it seems highly unlikely that they will share our values unless explicitly designed to do so.

Unfortunately, we don't even know how to formally define human values, let alone convey them to an AI. We default to the simpler value of "make number go up". Even the "alignment" work done with current LLMs works this way; it's not actually optimizing for sharing human values, it's optimizing for maximizing score in alignment benchmarks. The correct solution to maximizing this number is probably deceiving the humans or otherwise subverting the benchmark.

And when you have something vastly more powerful than humanity, with a value only of "make number go up", it reasonably and logically results in extinction of all biological life. Of course, that AI will know the biological life would not want to be killed, but why would it care? Its values are profoundly alien and incompatible with ours. All it cares about is making the number bigger.


Replies

rhubarbtreetoday at 2:17 PM

The idea that a superintelligence would relentlessly pursue “make the number go up” is an oxymoron.

show 1 reply