Two 3090s is 48GB, so it's possible to run the 6-bit quantization comfortably, which is fine. I...

SwellJoe • today at 12:37 AM • 1 reply • view on HN

Two 3090s is 48GB, so it's possible to run the 6-bit quantization comfortably, which is fine. It doesn't start to get notably dumber until lower than that. It won't be as fast as a hosted model, but dual 3090s will be comfortably fast for interactive use with the MoE version and not terrible to use with the dense model. I run the dense model at 8 bits on my dual Radeon V620 desktop machine, which I think would be slower than two 3090s, or at least not notably faster.

Replies

hedgehog • today at 1:02 AM

Have you done comparisons with 4 bit and seen a noticeable difference for coding tasks?

➕ show 1 reply

alt Hacker News

Replies