It runs both q2 and original (4 bit routed experts). At the same speed more or less. The q2 quants are not what you could expect: it works extremely well for a few reasons. For the full model you need a Mac with 256GB.
Out of curiosity, do you have any theories of why it works so well at such aggressive quantization levels?
Out of curiosity, do you have any theories of why it works so well at such aggressive quantization levels?