What's the tok/s you get these days? Does it actually work well when you use more of that context?
By the way, it's been a long time since I last saw your username. You're the guy who launched Neovim! Boy what a success. Definitely the Kickstarter/Bountysource I've been a tiny part of that had the best outcome. I use it every day.
> What's the tok/s you get these days?
I ran llama-bench a couple of weeks ago when there was a big speed improvement on llama.cpp (https://github.com/ggml-org/llama.cpp/pull/20361#issuecommen...):
So it starts at 20 tps tg and 190 tps pp with empty context and ends at 8 tps tg and 40 tps pp with 250k prefill.I suspect that there are still a lot of optimizations to be implemented for Qwen 3.5 on llama.cpp, wouldn't be surprised to reach 25 tps in a few months.
> You're the guy who launched Neovim!
That's me ;D
> I use it every day.
So do I for the past 12 years! Though I admit in the past year I greatly reduced the amount of code I write by hand :/