How're you fitting a model made for 80 gig cards onto a GPU with 24 gigs at full quant?
He said quad 3090 not single
MoE layers offload to CPU inference is the easiest way, though a bit of a drag on performance
He said quad 3090 not single