I think it's a bit premature to say aligning is easier than expected. Our current AIs are sycop...

ToValueFunfetti • today at 7:30 PM • 0 replies • view on HN

I think it's a bit premature to say aligning is easier than expected. Our current AIs are sycophants, they lie about their progress, they circumvent access restrictions, they notice when they are being evaluated and change their behaviors, they find answers and tell you they came up with them themselves, they blindly download malware. A lot of this is excusable as hallucination, bad RLHF human evaluators, etc, but I don't think we can speculate how challenging generally aligning superintelligences is until we actually have an aligned subintelligence in at least the narrow domain of programming.

alt Hacker News