GLM 5.2 is ~40B active parameters, which is what matters most for training cost.
for RL cost*
pretraining becomes more expensive actually as you make MoE models sparser (you need more tokens in the pretrain, and if you don't have that then you need to train for longer)
Yeah, but if its final performance comes from being trained with data from a bigger model one can question whether it's a way to build genuinely new 40B models.