logoalt Hacker News

andaitoday at 6:31 PM0 repliesview on HN

I've been testing some models that score higher than Opus 4.6.

They:

- hallucinate constantly

- can't follow basic instructions

- think they're Claude for some reason ;)