logoalt Hacker News

akawrylast Sunday at 1:23 PM0 repliesview on HN

Take a look at ik_llama.cpp: https://github.com/ikawrakow/ik_llama.cpp

CPU performance is much better than mainline llama, as well as having more quantization types available