logoalt Hacker News

krackersyesterday at 7:02 AM0 repliesview on HN

Papers on mechanistic interpratability and representation engineering, e.g. from Anthropic would be a good start.