> But on a tangent, why do you believe in mixture of experts
In a hardware inference approach you can do tens of thousands tokens per second and run your agents in a breadth first style. It is all very simply conceptually, and not more than a few years away.