logoalt Hacker News

irthomasthomastoday at 6:53 PM2 repliesview on HN

GLM 5.2 is ~40B active parameters, which is what matters most for training cost.


Replies

impossibleforktoday at 6:59 PM

Yeah, but if its final performance comes from being trained with data from a bigger model one can question whether it's a way to build genuinely new 40B models.

sometimelurkertoday at 7:18 PM

for RL cost*

pretraining becomes more expensive actually as you make MoE models sparser (you need more tokens in the pretrain, and if you don't have that then you need to train for longer)