Good questions. From what I can tell, vLLM semantic router is more optimized for one-off prompt/response workflows rather than agentic coding (I don't think it's cache aware).
As another commenter (https://news.ycombinator.com/item?id=48689994) pointed out, for one-off requests, I think it makes more sense to lock to one model whose behavior you understand very well. For dynamic requests like the ones going to a coding agent I think dynamic routing makes more sense but it does need to be cache aware.