> Of course when you have a long pipeline of chips that each token needs to pass through, that decreases the end-to-end tokens per second correspondingly.
No, it only increases the latency, and does not affect the throughput.
It does affect the throughput for an individual user because you need all output tokens up to n to generate output token n+1
It affects both. These systems are vastly more complex than the naive mental models being discussed in these comments.
For one thing, going chip-to-chip is not a faultless process and does not operate at the same speed as on-chip communication. So, yes, throughput can be reduced by splitting a computation across two chips of otherwise equal speed.