logoalt Hacker News

ameliustoday at 11:37 AM0 repliesview on HN

> But on a tangent, why do you believe in mixture of experts

In a hardware inference approach you can do tens of thousands tokens per second and run your agents in a breadth first style. It is all very simply conceptually, and not more than a few years away.