Almost all Chinese models are open weight research models.
My theory is that these models serve the purpose of being relatively easy to run/tweak for researchers, and mainly serve to demonstrate the effectiveness of new techniques in training and inference, as well as the strength of AI labs that created them.
They are not designed to be state of the art commercial models.
By choosing bigger model sizes, running more training epochs, and drilling the models a bit more on benchmarking questions, I'm sure the Chinese could close the gap, but that would delay these models, make them more expensive and harder to run without showing any tangible research benefit.
Also my 2c: I was perfectly happy with Sonnet 3.7 as of a year ago, if the Chinese have a model really as good as that (not only one that benchmarks as well), I'd definitely like to try it.
It is arguable that the new Minimax M2.1 and GLM4.7 are drastically above Sonnet 3.7 in capabilities.