logoalt Hacker News

ameliustoday at 12:00 PM2 repliesview on HN

> Of course when you have a long pipeline of chips that each token needs to pass through, that decreases the end-to-end tokens per second correspondingly.

No, it only increases the latency, and does not affect the throughput.


Replies

EdNuttingtoday at 12:06 PM

It affects both. These systems are vastly more complex than the naive mental models being discussed in these comments.

For one thing, going chip-to-chip is not a faultless process and does not operate at the same speed as on-chip communication. So, yes, throughput can be reduced by splitting a computation across two chips of otherwise equal speed.

qudenttoday at 12:05 PM

It does affect the throughput for an individual user because you need all output tokens up to n to generate output token n+1

show 1 reply