logoalt Hacker News

rustcleanertoday at 5:24 AM1 replyview on HN

Could it be done by making a sparse MoE of thousands, or tens of thousands, of smaller experts in very niche domains? Maybe a tree-like structure of experts which can delegate from relatively general but inaccurate to extremely niche but accurate? Also these experts might be plug-and-play, easily swap out an inferior expert with a stronger one in the future without having to redo the whole pile?


Replies

Zetaphortoday at 5:38 AM

That's not really how the experts in an MoE work. They activate on token probabilities and are activated on every token. You don't necessarily have a discrete math expert and a discrete physics expert. And if it were you would still need a router that is trained on all of those domains.