Could it be done by making a sparse MoE of thousands, or tens of thousands, of smaller experts in ve...

rustcleaner • today at 5:24 AM • 1 reply • view on HN

Could it be done by making a sparse MoE of thousands, or tens of thousands, of smaller experts in very niche domains? Maybe a tree-like structure of experts which can delegate from relatively general but inaccurate to extremely niche but accurate? Also these experts might be plug-and-play, easily swap out an inferior expert with a stronger one in the future without having to redo the whole pile?

Replies

Zetaphor • today at 5:38 AM

That's not really how the experts in an MoE work. They activate on token probabilities and are activated on every token. You don't necessarily have a discrete math expert and a discrete physics expert. And if it were you would still need a router that is trained on all of those domains.

alt Hacker News

Replies