logoalt Hacker News

hobofan10/12/20241 replyview on HN

Given the fact that this at the core relies on the `rayon` and `wide` libraries, which are decently baseline optimized but quite a bit away from what llama.cpp can do when being specialized on such a specific use-case, I think the speed is about what I would expect.

So yeah, I think there is a lot of room for optimization, and the only reason one would use this today is if they want to have a "simple" implementation that doesn't have any C/C++ dependencies for build tooling reasons.


Replies

littlestymaar10/12/2024

Your point is valid when it comes to rayon (I don't know much about wide) being inherently slower than custom optimization, but from what I've seen I suspect rayon isn't even the bottleneck in terms of performance, there's some decent margin of improvement (I'd expect at least double the throughput) without even doing arcane stuff.