logoalt Hacker News

m101today at 10:30 AM1 replyview on HN

If you’re running at 17k tokens / s what is the point of multiple agents?


Replies

ivan_gammeltoday at 11:43 AM

Different skills and context. Llama 3.1 8B has just 128k context length, so packing everything in it may be not a great idea. You may want one agent analyzing the requirements and designing architecture, one writing tests, another one writing implementation and the third one doing code review. With LLMs it’s also matters not just what you have in context, but also what is absent, so that model will not overthink it.

EDIT: just in case, I define agent as inference unit with specific preloaded context, in this case, at this speed they don’t have to be async - they may run in sequence in multiple iterations.