logoalt Hacker News

starspangled02/23/20251 replyview on HN

It significantly outperformed competitors on those benchmarks. Around as much as the deltas between some others, which are considered significant.


Replies

bccdee02/23/2025

The deltas between the others are mostly not significant either. They're all about equally good. There's no categorical difference between GPT-4 and Claude 3.5.

show 1 reply