logoalt Hacker News

johndoughtoday at 1:23 PM0 repliesview on HN

Only if chip-to-chip communication was a bottleneck. Which it isn't.

If a layer completely fits in SRAM (as is probably the case for Cerebras), you only have to communicate the hidden states between chips for each token. The hidden states are very small (7168 floats for DeepSeek-V3.2 https://huggingface.co/deepseek-ai/DeepSeek-V3.2/blob/main/c... ), which won't be a bottleneck.

Things get more complicated if a layer does not fit in SRAM, but it still works out fine in the end.