logoalt Hacker News

dragochatlast Wednesday at 8:28 AM0 repliesview on HN

I guess you could prove the conformance of a particular implementation if you'd implement separate Plan & Implement stages + a "superior" evaluator in the loop that would halt the evolution at a certain p(iq(next_version) > iq(evaluator)) as an "outer halt-switch" + many "inner halt-switches" that try to detect the arising of problematic behavior of particular interest.

Ofc it's stochastic and sooner or later such a system will "break out", but if by then sufficient "superior systems" with good behavior are deployed and can be targeted to hunt it, the chance of it overpowering all of them and avoiding detection by all would be close to zero. At cosmic scales where it stops being close to zero, you're protected by physics (speed of light + some thermodyn limits - we know they work by virtue of the anthropic principle, as if they didn't the universe would've already been eaten by some malign agent and we wouldn't be here asking the question - but then again, we're already assuming too much, maybe it has already happened and that's the Evil Demiurge we're musing about :P).