OK that’s what I figured you meant. FWIW, MoE as a term of art means something different, what I described. It’s internal to a single model, part of the logit generation process.
That's fine, you can pretend my entire diagram is one NN, end result will still be the same whether you put it all inside one box or break it out into many.
That's fine, you can pretend my entire diagram is one NN, end result will still be the same whether you put it all inside one box or break it out into many.