logoalt Hacker News

theptiptoday at 3:40 PM1 replyview on HN

I agree with all these observations.

This is the best argument for successionism IMO. If you can be confident that you are creating a BDFL that is genuinely better than human leaders (a quite low bar) then it seems a good trade, unless you are quite optimistic about humanity’s prospects for improvement.

The problem of course is how to be confident you are creating a good BDFL and not handing control of humanity’s future to an indifferent-at-best, deceptive/malicious at worst successor.

An especially thorny problem - even supposing success on all these difficult alignment problems; supposing Claude Omega really is super-rational / super-moral, and we all vote to make them president of Earth. Things might go great for a while. How would you be confident that a self-modifying agent can retain its values as it grows and re-trains itself?

This is where the LessWrong folks’ explorations into decision theory really come to bear: morality in the face of self-modifying agents becomes very weird. A lot of human moral intuitions break when the principals are able to modify their own code. (See Timeless Decision Theory for an attempt to solve these problems.)

I think the summary is, if you hand control over to a self-modifying AI anything like our current systems, it will go very badly.


Replies

justinlivitoday at 3:45 PM

Any supposed "AI BDFL" will be controlled by a human. The base concept is inherently flawed.

show 1 reply