logoalt Hacker News

skybrianyesterday at 7:39 PM1 replyview on HN

Knowing the performance is interesting. Apparently it's 1-3 tokens/second.


Replies

kgeistyesterday at 7:58 PM

ikllama.cpp is a fork of llama.cpp which specializes on CPU inference, some benchmarks from 1 year ago: https://github.com/ikawrakow/ik_llama.cpp/discussions/164