Almost every comment here is appealing to personal experience. By contrast, OP refers to two studies that compare performance on some kind of standardised test over a range of models.
Can't speak to how good those tests are, but they can't be worse than anecdotal evidence for something as vague/subjective as LLM performance.
But the studies are in 2024 and 2025. They don’t apply to current Claude models.