How much with llama.cpp? A 1b model should be a lot faster on a m2

jodleif • 10/12/2024 • 1 reply • view on HN

Replies

hobofan • 10/12/2024

Given the fact that this at the core relies on the `rayon` and `wide` libraries, which are decently baseline optimized but quite a bit away from what llama.cpp can do when being specialized on such a specific use-case, I think the speed is about what I would expect.

So yeah, I think there is a lot of room for optimization, and the only reason one would use this today is if they want to have a "simple" implementation that doesn't have any C/C++ dependencies for build tooling reasons.

➕ show 1 reply

alt Hacker News

Replies