Since no one has mentioned it yet: note that the benchmarks for large are for the base model, not fo...

tootyskooty • today at 4:52 PM • 0 replies • view on HN

Since no one has mentioned it yet: note that the benchmarks for large are for the base model, not for the instruct model available in the API.

Most likely reason is that the instruct model underperforms compared to the open competition (even among non-reasoners like Kimi K2).

alt Hacker News