This is a 30B parameter MoE with 3B active parameters and is the successor to their previous 7B omni model. [1]
You can expect this model to have similar performance to the non-omni version. [2]
There aren't many open-weights omni models so I consider this a big deal. I would use this model to replace the keyboard and monitor in an application while doing the heavy lifting with other tech behind the scenes. There is also a reasoning version, which might be a bit amusing in an interactive voice chat if it pronounces the thinking tokens while working through to a final answer.
1. https://huggingface.co/Qwen/Qwen2.5-Omni-7B
2. https://artificialanalysis.ai/models/qwen3-30b-a3b-instruct
Looks like it's not open source: https://www.alibabacloud.com/help/en/model-studio/qwen-omni#...
I can't find the weights for this new version anywhere. I checked modelscope and huggingface. It looks like they may have extended the context window to 200K+ tokens but I can't find the actual weights.
> There is also a reasoning version, which might be a bit amusing in an interactive voice chat if it pronounces the thinking tokens while working through to a final answer.
last i checked (months ago) claude used to do this
Haha, you could hear how it’s mind thinks, maybe by putting a lot of reverb on the thinking tokens or some other effect…
I dont think the Flash model discussed in the article is 30B
Their benchmark table shows it beating Qwen3-235B-A22B
Does "Flash" in the name of a Qwen model indicate a model-as-a-service and not open weights?
> This is a 30B parameter MoE with 3B active parameters
Where are you finding that info? Not saying you're wrong; just saying that I didn't see that specified anywhere in the linked page, or on their HF.
This is a stack of models:
- 650M Audio Encoder
- 540M Vision Encoder
- 30B-A3B LLM
- 3B-A0.3B Audio LLM
- 80M Transformer/200M ConvNet audio token to waveform
This is a closed source weight update to their Qwen3-Omni model. They had a previous open weight release Qwen/Qwen3-Omni-30B-A3B-Instruct and a closed version Qwen3-Omni-Flash.
You basically can't use this model right now since none of the open source inference framework have the model fully implemented. It works on transformers but it's extremely slow.