The hotness we are seeing is smaller 'expert' models with an 'orchestrator' model in front that evaulates the prompts and routes to the appropiate small models and then synthesizes the collected answer. Easier to split across many smaller, cheaper servers and more efficient than a huge monolithic model.
Do you have more info about this? I can't tell if you're being misled by the unfortunate "Mixture of Experts" terminology (which don't work the way you're describing), or alluding to something different.
Or, maybe I'm wrong, but my understanding is: MoE is just an architecture to keep the activated weights smaller per token. The experts get routed basically token-by-token, and the "experts" themselves don't have a semantic domain so the "expert" word was maybe a poor choice.