System RAM has much lower bandwidth and less predictable access. Notably, the transfer from system t...

Gareth321 • today at 7:09 PM • 0 replies • view on HN

System RAM has much lower bandwidth and less predictable access. Notably, the transfer from system to GPU is very slow. About 30x slower. LLMs aren’t designed to queue or parallelise operations to account for this. They just become much slower.

alt Hacker News