That implies a throughput of around 16 million tokens per second. Since coding agent loops are inher...

storystarling • yesterday at 8:37 PM • 1 reply • view on HN

That implies a throughput of around 16 million tokens per second. Since coding agent loops are inherently sequential—you have to wait for the inference to finish before the next step—that volume seems architecturally impossible. You're bound by latency, not just cost.

Replies

mrob • yesterday at 8:50 PM

The original post claimed they were "running hundreds of concurrent agents":

https://cursor.com/blog/scaling-agents

➕ show 1 reply

alt Hacker News

Replies