That's not really how the experts in an MoE work. They activate on token probabilities and are activated on every token. You don't necessarily have a discrete math expert and a discrete physics expert. And if it were you would still need a router that is trained on all of those domains.