There is a benchmark for performance work, and I think it is not being optimized by model vendors. T...

sanxiyn • today at 1:28 AM • 1 reply • view on HN

There is a benchmark for performance work, and I think it is not being optimized by model vendors. The latest result from GSO is that both Opus 4.6 and 4.7 slightly outperforms GPT 5.5. This also matches my experience.

https://gso-bench.github.io/

Replies

vitorsr • today at 1:52 AM

Tasks are taken from commit histories in public Git repositories which defeats the purpose.

alt Hacker News

Replies