https://huggingface.co/unsloth/GLM-4.7-GGUF
This user has also done a bunch of good quants:
Yes I usually run Unsloth models, however you are linking to the big model now (355B-A32B), which I can't run on my consumer hardware.
The flash model in this thread is more than 10x smaller (30B).
I find it hard to trust post training quantizations. Why don't they run benchmarks to see the degradation in performance? It sketches me out because it should be the easiest thing to automatically run a suite of benchmarks