> I wonder to what extent models should figure out which model to forward a query to. Or perhaps the big models could learn the difference between an easy and a hard question and charge accordingly?
This sounds like something a harness could do (and might already be doing), with work delegated to subagents running on lower-cost models.
Yes, they are all already doing this