So, how are you handling read/write caching? I mean, if I keep routing the next prompt based on the task weights? How about if I'm sending every 5th query to opus, which do expensive write cache?
We consider the cost of missing the cache when making each routing decision after the initial one. Discussed in a bit more depth here: https://news.ycombinator.com/item?id=48689448
We consider the cost of missing the cache when making each routing decision after the initial one. Discussed in a bit more depth here: https://news.ycombinator.com/item?id=48689448