logoalt Hacker News

user43928yesterday at 7:12 PM1 replyview on HN

Won't any input be charged uncached, and the output of the small model charged again as uncached input to the bigger model?

I don't know whether that comes out ahead compared to just staying with the better model in the first place.


Replies

mwigdahlyesterday at 7:27 PM

It's a good question, but for multiturn conversations even cached context adds up quickly. My experience has been that spawning off subagents for defined tasks in a large overall plan generally makes me come out ahead.

I'm sure folks' mileage will vary though.