The cool thing about the 3090s is the RAM bandwidth. Token generation is mostly bottlenecked on memo...

mips_avatar • yesterday at 7:30 PM • 2 replies • view on HN

The cool thing about the 3090s is the RAM bandwidth. Token generation is mostly bottlenecked on memory bandwidth. Dual 3090s have 1.87 TB/s memory bandwidth (0.936 TB/s each), vs the M5 Macbook pro with only 0.3 TB/s (max chip has up to 0.63 TB/s but it's a $10k machine at that config).

This translates to qwen 27b actually working fast enough for useful work on dual 3090s and being painfully slow on Macbook Pros. Also if you're running a big model on a macbook pro the UI gets laggy and the keyboard gets hot. Much better to run dual 3090s in your basement and connect to them from your Macbook.

Replies

CobaltFire • yesterday at 7:50 PM

$4.8k for 48GB Max (what the parent said). Half of your quote.

Even a 128GB is $6.8k today. Still only 2/3 your quote.

Bandwidth is relevant (I have both a 5090 and an M4 Max 128GB Studio, so have direct comparison right here), but quote the cost appropriately!

➕ show 1 reply

titanomachy • yesterday at 8:59 PM

The bandwidth argument is compelling, do we have benchmarks for these models? I’m curious what it translates to in tokens per second

➕ show 1 reply

alt Hacker News

Replies