logoalt Hacker News

SwellJoetoday at 12:37 AM1 replyview on HN

Two 3090s is 48GB, so it's possible to run the 6-bit quantization comfortably, which is fine. It doesn't start to get notably dumber until lower than that. It won't be as fast as a hosted model, but dual 3090s will be comfortably fast for interactive use with the MoE version and not terrible to use with the dense model. I run the dense model at 8 bits on my dual Radeon V620 desktop machine, which I think would be slower than two 3090s, or at least not notably faster.


Replies

hedgehogtoday at 1:02 AM

Have you done comparisons with 4 bit and seen a noticeable difference for coding tasks?

show 1 reply