logoalt Hacker News

jodleif10/12/20241 replyview on HN

How much with llama.cpp? A 1b model should be a lot faster on a m2


Replies

hobofan10/12/2024

Given the fact that this at the core relies on the `rayon` and `wide` libraries, which are decently baseline optimized but quite a bit away from what llama.cpp can do when being specialized on such a specific use-case, I think the speed is about what I would expect.

So yeah, I think there is a lot of room for optimization, and the only reason one would use this today is if they want to have a "simple" implementation that doesn't have any C/C++ dependencies for build tooling reasons.

show 1 reply