Different skills and context. Llama 3.1 8B has just 128k context length, so packing everything in it may be not a great idea. You may want one agent analyzing the requirements and designing architecture, one writing tests, another one writing implementation and the third one doing code review. With LLMs it’s also matters not just what you have in context, but also what is absent, so that model will not overthink it.
EDIT: just in case, I define agent as inference unit with specific preloaded context, in this case, at this speed they don’t have to be async - they may run in sequence in multiple iterations.
Different skills and context. Llama 3.1 8B has just 128k context length, so packing everything in it may be not a great idea. You may want one agent analyzing the requirements and designing architecture, one writing tests, another one writing implementation and the third one doing code review. With LLMs it’s also matters not just what you have in context, but also what is absent, so that model will not overthink it.
EDIT: just in case, I define agent as inference unit with specific preloaded context, in this case, at this speed they don’t have to be async - they may run in sequence in multiple iterations.