logoalt Hacker News

Gareth321today at 7:09 PM0 repliesview on HN

System RAM has much lower bandwidth and less predictable access. Notably, the transfer from system to GPU is very slow. About 30x slower. LLMs aren’t designed to queue or parallelise operations to account for this. They just become much slower.