None of inference frameworks (vLLM/SGLang) supports the full model, let alone non-nvidia.

red2awn • 12/10/2025 • 3 replies • view on HN

Replies

AndreSlavescu • 12/10/2025

We actually deployed working speech to speech inference that builds on top of vLLM as the backbone. The main thing was to support the "Talker" module, which is currently not supported on the qwen3-omni branch for vLLM.

Check it out here: https://models.hathora.dev/model/qwen3-omni

➕ show 2 replies

sosodev • 12/10/2025

That's unfortunate but not too surprising. This type of model is very new to the local hosting space.

whimsicalism • last Thursday at 2:23 AM

Makes sense, I think streaming audio->audio inference is a relatively big lift.

➕ show 1 reply

alt Hacker News

Replies