I guess that depends on your definition of "decent". For the smaller models that can run o...

dajonker • last Sunday at 8:59 PM • 1 reply • view on HN

I guess that depends on your definition of "decent". For the smaller models that can run on a 16/24/32 GB nvidia card, the chip is anywhere between 3x and 10x slower compared to say a 4080 super or a 3090 which are relatively cheap used.

Biggest limitations are the memory bandwidth which limits token generation and the fact it's not a CUDA chip, meaning longer time until first token for theoretically similar hardware specifications.

Any model bigger than what fits in 32 GB VRAM is - in my opinion - currently unusable on "consumer" hardware. Perhaps a tinybox with 144 GB of VRAM and close to 6 TB/s memory bandwidth will get you a nice experience for consumer grade hardware but it's quite the investment (and power draw)

Replies

rubyn00bie • last Sunday at 9:14 PM

I think it depends on the use case, slow isn’t that bad if you’re asking questions infrequently. I downloaded a model a few weeks ago that was roughly 80GBs in size and ran it on my 3090 just to see how it was… and it was okay. Fast? Nope. But it did it. If the answers were materially better I’d be happy to wait a minute for the output, but they weren’t. I’d like to try one of the really large ones, just to see how slow it is, but need to clear some space to even download it.

alt Hacker News

Replies