logoalt Hacker News

WithinReasonyesterday at 2:49 PM4 repliesview on HN

It's on the page:

  Precision  Quantization Tag File Size
  1-bit      UD-IQ1_M         10 GB
  2-bit      UD-IQ2_XXS       10.8 GB
             UD-Q2_K_XL       12.3 GB
  3-bit      UD-IQ3_XXS       13.2 GB
             UD-Q3_K_XL       16.8 GB
  4-bit      UD-IQ4_XS        17.7 GB
             UD-Q4_K_XL       22.4 GB
  5-bit      UD-Q5_K_XL       26.6 GB
  16-bit     BF16             69.4 GB

Replies

Aurornisyesterday at 3:28 PM

Additional VRAM is needed for context.

This model is a MoE model with only 3B active parameters per expert which works well with partial CPU offload. So in practice you can run the -A(N)B models on systems that have a little less VRAM than you need. The more you offload to the CPU the slower it becomes though.

show 1 reply
estyesterday at 4:34 PM

I really want to know what does M, K, XL XS mean in this context and how to choose.

I searched all unsloth doc and there seems no explaination at all.

show 3 replies
JKCalhounyesterday at 4:19 PM

"16-bit BF16 69.4 GB"

Is that (BF16) a 16-bit float?

show 5 replies
palmoteayesterday at 3:00 PM

Thanks! I'd scanned the main content but I'd been blind to the sidebar on the far right.