logoalt Hacker News

storystarlingyesterday at 8:37 PM1 replyview on HN

That implies a throughput of around 16 million tokens per second. Since coding agent loops are inherently sequential—you have to wait for the inference to finish before the next step—that volume seems architecturally impossible. You're bound by latency, not just cost.


Replies

mrobyesterday at 8:50 PM

The original post claimed they were "running hundreds of concurrent agents":

https://cursor.com/blog/scaling-agents

show 1 reply