> at Q4_K_M, stock-style quantization is retaining ~99–99.8% of BF16 accuracy
That's a tall claim. By that measure, even NVIDIA's QAD, which is AFAIK is currently SOTA for 4-bit quantization (albeit NVFP4 instead of INT4) would be worse than Q4_K_M RTN quantization. :D