logoalt Hacker News

tootyskootytoday at 4:52 PM0 repliesview on HN

Since no one has mentioned it yet: note that the benchmarks for large are for the base model, not for the instruct model available in the API.

Most likely reason is that the instruct model underperforms compared to the open competition (even among non-reasoners like Kimi K2).