Nemotron-3-Nano-30B-A3B[0][1] is a very impressive local model. It is good with tool calling and works great with llama.cpp/Visual Studio Code/Roo Code for local development.
It doesn't get a ton of attention on /r/LocalLLaMA but it is worth trying out, even if you have a relatively modest machine.
[0] https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B...
[1] https://huggingface.co/unsloth/Nemotron-3-Nano-30B-A3B-GGUF
It was good for like, one month. Qwen3 30b dominated for half a year before that, and GLM-4.7 Flash 30b took over the crown soon after Nemotron 3 Nano came out. There was basically no time period for it to shine.
I find the Q8 runs a bit more than twice as fast as gpt-120b since I don’t have to offload as many MoE layers, but is just about as capable if not better.
Some of NVIDIA's models also tend to have interesting architectures. For example, usage of the MAMBA architecture instead of purely transformers: https://developer.nvidia.com/blog/inside-nvidia-nemotron-3-t...