logoalt Hacker News

Forgeties79today at 2:03 AM1 replyview on HN

I work in media production and I have the same thought constantly. Hell I curse in church as far as my industry is concerned because I find 2.5 to be fine for most of us. 10 absolutely.


Replies

nine_ktoday at 2:41 AM

I suppose the throughput is not the key, latency is. When you split ann operation that normally ran within one machine between two machines, anything that crosses the boundary becomes orders of magnitude slower. Even with careful structuring, there are limits of how little and how rarely you can send data between nodes.

I suppose that splitting an LLM workload is pretty sensitive to that.