That's not really how the experts in an MoE work. They activate on token probabilities and are ...

Zetaphor • today at 5:38 AM • 0 replies • view on HN

That's not really how the experts in an MoE work. They activate on token probabilities and are activated on every token. You don't necessarily have a discrete math expert and a discrete physics expert. And if it were you would still need a router that is trained on all of those domains.

alt Hacker News