logoalt Hacker News

storystarlingyesterday at 6:14 PM1 replyview on HN

I suspect the bottleneck on 12+ year old hardware wouldn't be power but the interconnects. SOTA training is bound by gradient synchronization latency. Without NVLink you hit a hard wall where the compute spends most of its time waiting on PCIe or ethernet.


Replies

fc417fc802yesterday at 6:28 PM

Fair point. Though if this were actually attempted I imagine it would start with making changes to the model architecture, the physical hardware, or both.

My hypothetical is probably somewhat over the top given that isn't China somewhere in the vicinity of 7 nm at present?