Good questions. From what I can tell, vLLM semantic router is more optimized for one-off prompt/...

adchurch • yesterday at 9:33 PM • 0 replies • view on HN

Good questions. From what I can tell, vLLM semantic router is more optimized for one-off prompt/response workflows rather than agentic coding (I don't think it's cache aware).

As another commenter (https://news.ycombinator.com/item?id=48689994) pointed out, for one-off requests, I think it makes more sense to lock to one model whose behavior you understand very well. For dynamic requests like the ones going to a coding agent I think dynamic routing makes more sense but it does need to be cache aware.

alt Hacker News