Isn't it the case that OpenAI and Anthropic regularly just swap for whoever is at the top of the latest benchmarks? They're also so close in scores that it's effectively a wash anyways.
What OP is referring to is Anthropic aligning with corporate terms and conditions early, positioning themselves to be effectively resold by AWS rather than requiring orgs to procure them directly. This is huge in the enterprise world because the processes to get broad approval are generally far smaller and shorter for "just another AWS service" compared to a whole new vendor.
Isn't it an open secret that benchmarks are largly irrelevant at this point? Why else we do all have a personalized test battery for new models? That said i've stopped testing chatgpt entierly. Its still ok but is beaten by local models and it gets thrashed by non oai frontier providers. I get the history, but holding up oai outputs as equivallent is lile comparing yahoo to google post yahoo's collapse in search domains.
Oai language models are largly irrelevant at this point imo.
OpenAI did teh same thing with Microsoft/Azure though.