logoalt Hacker News

frotaurtoday at 12:50 PM0 repliesview on HN

Afaik the experts are not usually very interpretable, and generally would be surprised if at least one does not change every token. I don't know what happens in practice, but I know at least during training, nothing is done to minimize the number of expert switches between tokens.