Take a look at ik_llama.cpp: https://github.com/ikawrakow/ik_llama.cpp
CPU performance is much better than mainline llama, as well as having more quantization types available