For Intel CPUs, Phi-2 (2.7B) and TinyLlama (1.1B) run reasonably well using llama.cpp with 4-bit quantization. GGUF models with INT4 quantization typically need ~2GB RAM per billion parameters, so even older machines can handle smaller models.
Take a look at ik_llama.cpp: https://github.com/ikawrakow/ik_llama.cpp
CPU performance is much better than mainline llama, as well as having more quantization types available
Take a look at ik_llama.cpp: https://github.com/ikawrakow/ik_llama.cpp
CPU performance is much better than mainline llama, as well as having more quantization types available