The batch size explanation is wrong. Given how much Claude Code is used, finding fellow "bus pa...

dist-epoch • today at 10:28 AM • 1 reply • view on HN

The batch size explanation is wrong. Given how much Claude Code is used, finding fellow "bus passengers" is not an issue, you don't need to wait.

The real reason which batching increases latency is multi-factored and more complex to explain.

Replies

qeternity • today at 10:35 AM

Yes this article is full of misunderstanding. The main explanation of bottleneck is wrong: it’s the model weights which dominate memory bandwidth (and hence why batching multiple requests in a single pass increases total throughput). If copying user tokens was the bottle neck, batching would not achieve any speed up.

When an author is confused about something so elementary, I can’t trust anything else they write.

➕ show 3 replies

alt Hacker News

Replies