Is there a reason human preference data is even needed? Don't LLMs already have a strong enough notion of question complexity to build a dataset for routing?
LLMs don't have notions ... they are pattern matchers against a vast database of human text.
> a strong enough notion of question complexity
Aka Wisdom. No, LLMs don't have that. Me neither, I usually have to step in the rabbit holes in order to detect them.