From the HuggingFace model card [1] they state: > "In particular, Qwen3.5-Plus is the host...

ggcr • today at 9:52 AM • 2 replies • view on HN

From the HuggingFace model card [1] they state:

> "In particular, Qwen3.5-Plus is the hosted version corresponding to Qwen3.5-397B-A17B with more production features, e.g., 1M context length by default, official built-in tools, and adaptive tool use."

Anyone knows more about this? The OSS version seems to have has 262144 context len, I guess for the 1M they'll ask u to use yarn?

[1] https://huggingface.co/Qwen/Qwen3.5-397B-A17B

Replies

NitpickLawyer • today at 10:17 AM

Yes, it's described in this section - https://huggingface.co/Qwen/Qwen3.5-397B-A17B#processing-ult...

Yarn, but with some caveats: current implementations might reduce performance on short ctx, only use yarn for long tasks.

Interesting that they're serving both on openrouter, and the -plus is a bit cheaper for <256k ctx. So they must have more inference goodies packed in there (proprietary).

We'll see where the 3rd party inference providers will settle wrt cost.

➕ show 1 reply

danielhanchen • today at 10:03 AM

Unsure but yes most likely they use YaRN, and maybe trained a bit more on long context maybe (or not)

alt Hacker News

Replies