logoalt Hacker News

dist-epochtoday at 10:28 AM1 replyview on HN

The batch size explanation is wrong. Given how much Claude Code is used, finding fellow "bus passengers" is not an issue, you don't need to wait.

The real reason which batching increases latency is multi-factored and more complex to explain.


Replies

qeternitytoday at 10:35 AM

Yes this article is full of misunderstanding. The main explanation of bottleneck is wrong: it’s the model weights which dominate memory bandwidth (and hence why batching multiple requests in a single pass increases total throughput). If copying user tokens was the bottle neck, batching would not achieve any speed up.

When an author is confused about something so elementary, I can’t trust anything else they write.

show 3 replies