logoalt Hacker News

ai_slop_hatertoday at 5:29 PM1 replyview on HN

Isn't this more expensive than always using the same model, since, as I understand, by routing to different models you give up on cache?


Replies

adchurchtoday at 5:42 PM

If you statelessly route each new request: yes it does end up being more expensive!

So our routing is cache-aware. It will have a much higher threshold to switch from one model to another if there's already some cache for the first model. Experimentally this solves the problem (like I said we've saved 40% ourselves vs. what we would have otherwise paid).