logoalt Hacker News

foobar10000today at 4:52 PM0 repliesview on HN

2 things:

1. Parallel investigation : the payoff form that is relatively small - starting K subagents assumes you have K independent avenues of investigation - and quite often that is not true. Somewhat similar to next-turn prediction using a speculative model - works well enough for 1 or 2 turns, but fails after.

2. Input caching is pretty much fixes prefill - not decode. And if you look at frontier models - for example open-weight models that can do reasoning - you are looking at longer and longer reasoning chains for heavy tool-using models. And reasoning chains will diverge very vey quickly even from the same input assuming a non-0 temp.