logoalt Hacker News

mips_avataryesterday at 7:30 PM2 repliesview on HN

The cool thing about the 3090s is the RAM bandwidth. Token generation is mostly bottlenecked on memory bandwidth. Dual 3090s have 1.87 TB/s memory bandwidth (0.936 TB/s each), vs the M5 Macbook pro with only 0.3 TB/s (max chip has up to 0.63 TB/s but it's a $10k machine at that config).

This translates to qwen 27b actually working fast enough for useful work on dual 3090s and being painfully slow on Macbook Pros. Also if you're running a big model on a macbook pro the UI gets laggy and the keyboard gets hot. Much better to run dual 3090s in your basement and connect to them from your Macbook.


Replies

CobaltFireyesterday at 7:50 PM

$4.8k for 48GB Max (what the parent said). Half of your quote.

Even a 128GB is $6.8k today. Still only 2/3 your quote.

Bandwidth is relevant (I have both a 5090 and an M4 Max 128GB Studio, so have direct comparison right here), but quote the cost appropriately!

show 1 reply
titanomachyyesterday at 8:59 PM

The bandwidth argument is compelling, do we have benchmarks for these models? I’m curious what it translates to in tokens per second

show 1 reply