logoalt Hacker News

libraryofbabeltoday at 7:25 AM0 repliesview on HN

LLM’s aren’t software (except in an uninteresting obvious sense); they are “grown, not made” as the saying is. And sure, they can find which weights activate when goblins come up (that’s basic mechanistic interpretability stuff), but it’s not as simple as just going in and deleting parts of the network. This thing is irreducibly complex in an organic delocalized way and information is highly compressed within it; the same part of the network serves many different purposes at once. Going in and deleting it you will probably end up with other weird behaviors.