logoalt Hacker News

Zetaphortoday at 5:38 AM0 repliesview on HN

That's not really how the experts in an MoE work. They activate on token probabilities and are activated on every token. You don't necessarily have a discrete math expert and a discrete physics expert. And if it were you would still need a router that is trained on all of those domains.