There is a newer PR which will probably be merged soon: https://github.com/ggml-org/llama.cpp/pull/22673
Ohhhh geee!!! I just applied the patch to my local git copy. You need to use the model on the PR that he submitted, the model is particular because it has extra information that allows the MTP to happen. I have two amd gpus, and qwen3.6 27B qk6 does around 20t/s generation... If I run it only on one I get like 35t/s.
But with this patch I saw 46t/s with qwen3.6 27B q8... this is insane, it's 250% faster than the original speed, there was no gpu I could upgrade to get that kind of boost, amazing!
Ollama merged a PR for MTP about 2 hours ago, as well:
https://github.com/ollama/ollama/pull/15980
Edit: Seems they also have a pre-release version out with the functionality added: https://github.com/ollama/ollama/releases/tag/v0.23.1-rc0