Would you mind sharing what hardware/card(s) you're using? And is https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B... one of the ones you've tested?
Support for this landed in llama.cpp recently if anyone is interested in running it locally.
Support for this landed in llama.cpp recently if anyone is interested in running it locally.