logoalt Hacker News

minimaltomyesterday at 10:26 PM1 replyview on HN

Between this, the emotions paper, golden gate claude etc, it doesn't seem like such a stretch that Anthropic are doing some kind of activation steering as part of training (and its part of their lead)


Replies

2001zhaozhaoyesterday at 11:04 PM

it could be helpful in gettig their learnings to generalize from RL