I've been testing some models that score higher than Opus 4.6.
They:
- hallucinate constantly
- can't follow basic instructions
- think they're Claude for some reason ;)