Has anyone managed to get this to work in LM Studio? They've got a option in the UI, but it never seems to allow me to enable it.
Yes. Make sure you’re not using the Gemma sparse models since they don’t have a small model to use. Also I removed all the image models from the workspace.
Normally when LM Studio doesn't like it it's because of the presence of mmproj files in the folder. Sometimes removing them helps it show up.
They're somehow connected to vision & block speculative decode...don't ask me how/why though
For gemma specifically had more luck with speculative using the llama-server route than lm studio
I've gotten it to work with other models. They've got to be perfectly aligned usually, in terms of provider, quantization etc. Might be a bit before you can get a matched set.
It's not implemented in mlx[1] yet (or llama.cpp[2]), so it may take a while.
[1] https://github.com/ml-explore/mlx-lm/pull/990
[2] https://github.com/ggml-org/llama.cpp/pull/22673