logoalt Hacker News

uplifteryesterday at 11:50 PM0 repliesview on HN

A contradictory possibility is that agents which have different ultimate objectives can have different and disjunct sets of goals which are instrumental towards their objectives.

I do think the core idea of instrumental convergence is wrong. In the hypothetical scenario you describe, the behavior of the agent, whether it learns to replace itself or not, will depend on its goal, its knowledge of and ability to reason about the problem, and the learning algorithm it employs. These are just some of the variables that you’d need to fill in to get the answer to your question. Instrumental convergence theoreticians suggest one can just gloss over these details and assume any hypothetical AI will behave certain ways in various narratively described situations, but we can’t. The behavior of an AI will be contingent on multiple details of the situation, and those details can mean that no goals instrumental to one agent are instrumental to another.