Looks interesting!
Out of curiosity, how does it compare with vLLM Semantic Router?
For reference:
https://vllm-semantic-router.com/
https://github.com/vllm-project/semantic-router
vLLM Semantic Router: Signal Driven Decision Routing for Mixture-of-Modality Models, https://arxiv.org/abs/2603.04444
https://github.com/vllm-project/semantic-router
For instance, does it offer similar algorithms:
- vllm-sr/auto: efficient, fast, balanced routing, similar in spirit to Fugu // Sakana Fugu — Multi-Agent System as a Model: https://sakana.ai/fugu/ - vllm-sr/fusion: panel-style multi-model reasoning and synthesis. - vllm-sr/flow: router-native workflow orchestration - vllm-sr/remom: multi-round reasoning over one or multiple models.
FWIW, it does look good on https://routeworks.github.io/leaderboard
Ref.
RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers, https://arxiv.org/abs/2510.00202, https://github.com/RouteWorks/RouterArena
Good questions. From what I can tell, vLLM semantic router is more optimized for one-off prompt/response workflows rather than agentic coding (I don't think it's cache aware).
As another commenter (https://news.ycombinator.com/item?id=48689994) pointed out, for one-off requests, I think it makes more sense to lock to one model whose behavior you understand very well. For dynamic requests like the ones going to a coding agent I think dynamic routing makes more sense but it does need to be cache aware.