something I have been wondering about is doing regressive layer specific quantization based on large test sets. ie reduce very specifically layers that don't improve general quality.
This is a thing! For example, https://arxiv.org/abs/2511.06516
This is a thing! For example, https://arxiv.org/abs/2511.06516