logoalt Hacker News

libraryofbabellast Sunday at 11:40 PM0 repliesview on HN

The papers from Anthropic on interpretability are pretty good. They look at how certain concepts are encoded within the LLM.