It significantly outperformed competitors on those benchmarks. Around as much as the deltas between some others, which are considered significant.
The deltas between the others are mostly not significant either. They're all about equally good. There's no categorical difference between GPT-4 and Claude 3.5.
The deltas between the others are mostly not significant either. They're all about equally good. There's no categorical difference between GPT-4 and Claude 3.5.