logoalt Hacker News

verdvermtoday at 4:17 PM3 repliesview on HN

https://huggingface.co/moonshotai/Kimi-K2.6

Is this the same model?

Unsloth quants: https://huggingface.co/unsloth/Kimi-K2.6-GGUF

(work in progress, no gguf files yet, header message saying as much)


Replies

SwellJoetoday at 5:25 PM

A trillion parameters is wild. That's not going to quantize to anything normal folks can run. Even at 1-bit, it's going to be bigger than what a Strix Halo or DGX Spark can run. Though I guess streaming from system RAM and disk makes it feasible to run it locally at <1 token per second, or whatever. GLM 5.1, at 754B parameters, is already beyond any reasonable self-hosting hardware (1-bit quantization is 206GB). Maybe a Mac Studio with 512GB can run them at very low-bit quantizations, also pretty slowly.

show 1 reply
Balinarestoday at 4:33 PM

Quite curious how well real usage will back the benchmarks, because even if it's only Opus ballpark, open weights Opus ballpark is seismic.

gpmtoday at 5:14 PM

Huh, so the metadata says 1.1 trillion parameters, each 32 or 16 bits.

But the files are only roughly 640GB in size (~10GB * 64 files, slightly less in fact). Shouldn't they be closer to 2.2TB?

show 2 replies