Related interesting find on Qwen.
"Qwen's base models live in a very exam-heavy basin - distinct from other base models like llama/gemma. Shown below are the embeddings from randomly sampled rollouts from ambiguous initial words like "The" and "A":"
https://xcancel.com/N8Programs/status/2044408755790508113