I’ve known both Ben and Eliezer since the 1990s and enjoyed the arguments. Back then I was doing serious AI research along the same lines as Marcus Hutter and Shane Legg, which had a strong basis in algorithmic information theory.
While I have significant concerns about AGI, I largely reject both Eliezer’s and Ben’s models of where the risks are. It is important to avoid the one-dimensional “two faction” model that dominates the discourse because it really doesn’t apply to complex high-dimensionality domains like AGI risk.
IMO, the main argument against Eliezer’s perspective is that it relies pervasively on a “spherical cow on a frictionless plane” model of computational systems. It is fundamentally mathematical, it does not concern itself with the physical limitations of computational systems in our universe. If you apply a computational physics lens then many of the assumptions don’t hold up. There is a lot of “and then something impossible happens based on known physics” buried in the assumptions that have never been addressed.
That said, I think Eliezer’s notion that AGI fundamentally will be weakly wired to human moral norms is directionally correct.
Most of my criticism of Ben’s perspective is against the idea that some kind of emergent morality that we would recognize is a likely outcome based on biological experience. The patterns of all biology emerged in a single evolutionary context. There is no reason to expect those patterns to be hardwired into an AGI that developed along a completely independent path. AGI may be created by humans but their nature isn’t hardwired by human evolution.
My own hypothesis is that AGI, such as it is, will largely reflect the biases of the humans that built it but will not have the biological constraints on expression implied by such programming in humans. That is what the real arms race is about.
But that is just my opinion.
Can you give concrete examples of "something impossible happens based on known physics"? I have followed the AI debate for a long time but I can't think of what those might be.
> Most of my criticism of Ben’s perspective is against the idea that some kind of emergent morality that we would recognize
I think Anthropic has already provided some evidence that intelligence is tied to morality (and vice versa) [1]. When they tried to steer LLM models morals they saw intelligence degradation also.
[1]: https://www.anthropic.com/research/evaluating-feature-steeri...