I have the RAM, but not the VRAM. What kind of speed/tps could you expect from a 3090 with 24GB...

cheema33 • yesterday at 10:37 PM • 1 reply • view on HN

I have the RAM, but not the VRAM. What kind of speed/tps could you expect from a 3090 with 24GBs of RAM? I am somewhat tempted to pick a GPU with 24GBs of RAM.

Replies

phamilton • today at 12:19 AM

Generation is basically just memory bandwidth math.

Each token has to read all the active weights. I think that's around 40B parameters active. At a 4-bit quant that's 20GB. With 100GB/s (replace with whatever your bandwidth is) and you get 5 tokens per second.

➕ show 1 reply

alt Hacker News

Replies