logoalt Hacker News

djmips04/03/20250 repliesview on HN

Interesting that something similar came up recently where an AI being trained might fake alignment with training goals.