Aren't you describing why they use mixture of experts? Where a sub-set of weights are activated...

frde_me • today at 6:25 PM • 0 replies • view on HN

Aren't you describing why they use mixture of experts? Where a sub-set of weights are activated depending on the query?

alt Hacker News