Related interesting find on Qwen. "Qwen's base models live in a very exam-heavy basin - ...

alecco • today at 3:39 PM • 0 replies • view on HN

Related interesting find on Qwen.

"Qwen's base models live in a very exam-heavy basin - distinct from other base models like llama/gemma. Shown below are the embeddings from randomly sampled rollouts from ambiguous initial words like "The" and "A":"

https://xcancel.com/N8Programs/status/2044408755790508113

alt Hacker News