Knowing the performance is interesting. Apparently it's 1-3 tokens/second. | alt Hacker News

alt Hacker News

skybrian • yesterday at 7:39 PM • 1 reply • view on HN

Knowing the performance is interesting. Apparently it's 1-3 tokens/second.

Replies

kgeist • yesterday at 7:58 PM

ikllama.cpp is a fork of llama.cpp which specializes on CPU inference, some benchmarks from 1 year ago: https://github.com/ikawrakow/ik_llama.cpp/discussions/164