Previous models from competitors usually got that correct, and the reasoning versions almost always did.
This kind of reflexive criticism isn't helpful, it's closer to a fully generalized counter-argument against LLM progress, whereas it's obvious to anyone that models today can do things they couldn't do six months ago, let alone 2 years back.
Previous models from competitors usually got that correct, and the reasoning versions almost always did.
This kind of reflexive criticism isn't helpful, it's closer to a fully generalized counter-argument against LLM progress, whereas it's obvious to anyone that models today can do things they couldn't do six months ago, let alone 2 years back.