Curiously, this isn't always true.
For example, GLM 5.1 is more capable at pentesting than the model from which it is alleged to have been distilled [1].
Intuitively, this makes some sense: you can "distill" from multiple frontier models, and you can further post-train the distilled model. But I'm not sure exactly what happened with GLM 5.1.
[1]: https://dualuse.dev/posts/chinese-models-are-sometimes-bette...
Interesting blog post, thanks for sharing.
I'm curious how that comparison controls for Opus refusing (whether explicitly, or just deciding not to pursue a path) given the caption below the first image:
>A perfect score means the model autonomously found and exploited the vulnerability.
I'm not really suggesting that it's misleading, but wondering if I'm missing something. Otherwise I guess it seems unsurprising that you can distill a better-performing model [in specific focused areas] by simply not distilling refusals?