Aren't you describing why they use mixture of experts? Where a sub-set of weights are activated depending on the query?