logoalt Hacker News

slopusilayesterday at 10:46 PM1 replyview on HN

on prem economics dont work because you can't batch requests. unless you are able to run 100 agents at the same time all the time


Replies

zozbot234today at 1:33 AM

> unless you are able to run 100 agents at the same time all the time

Except that newer "agent swarm" workflows do exactly that. Besides, batching requests generally comes with a sizeable increase in memory footprint, and memory is often the main bottleneck especially with the larger contexts that are typical of agent workflows. If you have plenty of agentic tasks that are not especially latency-critical and don't need the absolutely best model, it makes plenty of sense to schedule these for running locally.