logoalt Hacker News

matt_dtoday at 8:22 PM1 replyview on HN

Looks interesting!

Out of curiosity, how does it compare with vLLM Semantic Router?

For reference:

https://vllm-semantic-router.com/

https://github.com/vllm-project/semantic-router

vLLM Semantic Router: Signal Driven Decision Routing for Mixture-of-Modality Models, https://arxiv.org/abs/2603.04444

https://github.com/vllm-project/semantic-router

For instance, does it offer similar algorithms:

- vllm-sr/auto: efficient, fast, balanced routing, similar in spirit to Fugu // Sakana Fugu — Multi-Agent System as a Model: https://sakana.ai/fugu/ - vllm-sr/fusion: panel-style multi-model reasoning and synthesis. - vllm-sr/flow: router-native workflow orchestration - vllm-sr/remom: multi-round reasoning over one or multiple models.

FWIW, it does look good on https://routeworks.github.io/leaderboard

Ref.

RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers, https://arxiv.org/abs/2510.00202, https://github.com/RouteWorks/RouterArena


Replies

adchurchtoday at 9:33 PM

Good questions. From what I can tell, vLLM semantic router is more optimized for one-off prompt/response workflows rather than agentic coding (I don't think it's cache aware).

As another commenter (https://news.ycombinator.com/item?id=48689994) pointed out, for one-off requests, I think it makes more sense to lock to one model whose behavior you understand very well. For dynamic requests like the ones going to a coding agent I think dynamic routing makes more sense but it does need to be cache aware.