> Of course when you have a long pipeline of chips that each token needs to pass through, that de...

amelius • today at 12:00 PM • 2 replies • view on HN

> Of course when you have a long pipeline of chips that each token needs to pass through, that decreases the end-to-end tokens per second correspondingly.

No, it only increases the latency, and does not affect the throughput.

Replies

EdNutting • today at 12:06 PM

It affects both. These systems are vastly more complex than the naive mental models being discussed in these comments.

For one thing, going chip-to-chip is not a faultless process and does not operate at the same speed as on-chip communication. So, yes, throughput can be reduced by splitting a computation across two chips of otherwise equal speed.

qudent • today at 12:05 PM

It does affect the throughput for an individual user because you need all output tokens up to n to generate output token n+1

➕ show 1 reply

alt Hacker News

Replies