This is called a "LLM alloy", you can even do it in agentic, where you simply swap the model on each llm invocation.
It does actually significantly boost performance. There was an article on here about it recently, I'll see if I can find it.
Edit: https://news.ycombinator.com/item?id=44630724
They found the more different the models were (the less overlap in correctly solved problems), the more it boosted the score.
That sounds quite interesting. Makes me wonder if sooner or later they will have to train multiple independent models that cover those different niches. But maybe we will see that sooner or later. Thanks for the link.