Won't any input be charged uncached, and the output of the small model charged again as uncached input to the bigger model?
I don't know whether that comes out ahead compared to just staying with the better model in the first place.
It's a good question, but for multiturn conversations even cached context adds up quickly. My experience has been that spawning off subagents for defined tasks in a large overall plan generally makes me come out ahead.
I'm sure folks' mileage will vary though.
It's a good question, but for multiturn conversations even cached context adds up quickly. My experience has been that spawning off subagents for defined tasks in a large overall plan generally makes me come out ahead.
I'm sure folks' mileage will vary though.