>Now they have to be lucky to be 6 months ahead to an open model with at most half the parameter count, trained on 1%-2% the hardware US models are trained on.
Maybe there's a limit in training and throwing more hardware at it does very little improvement?