It runs both q2 and original (4 bit routed experts). At the same speed more or less. The q2 quants a...

antirez • yesterday at 8:37 PM • 1 reply • view on HN

It runs both q2 and original (4 bit routed experts). At the same speed more or less. The q2 quants are not what you could expect: it works extremely well for a few reasons. For the full model you need a Mac with 256GB.

Replies

someone13 • today at 2:31 AM

Out of curiosity, do you have any theories of why it works so well at such aggressive quantization levels?

alt Hacker News

Replies