Take a look at ik_llama.cpp:

akawry • last Sunday at 1:23 PM • 0 replies • view on HN

CPU performance is much better than mainline llama, as well as having more quantization types available

alt Hacker News