logoalt Hacker News

aleccotoday at 3:39 PM0 repliesview on HN

Related interesting find on Qwen.

"Qwen's base models live in a very exam-heavy basin - distinct from other base models like llama/gemma. Shown below are the embeddings from randomly sampled rollouts from ambiguous initial words like "The" and "A":"

https://xcancel.com/N8Programs/status/2044408755790508113