logoalt Hacker News

Snuggly73last Tuesday at 7:43 PM0 repliesview on HN

Ok, if its almighty, then why is not the benchmarks at 100%? If you look at the individual issues, those are somewhat small and trivial changes in existing codebases.

https://swe-rebench.com/

(note that if you look at individual slices, Opus is getting often outperformed by Sonnet).