GLM 5.2 is ~40B active parameters, which is what matters most for training cost.

irthomasthomas • today at 6:53 PM • 2 replies • view on HN

Replies

impossiblefork • today at 6:59 PM

Yeah, but if its final performance comes from being trained with data from a bigger model one can question whether it's a way to build genuinely new 40B models.

sometimelurker • today at 7:18 PM

for RL cost*

pretraining becomes more expensive actually as you make MoE models sparser (you need more tokens in the pretrain, and if you don't have that then you need to train for longer)

alt Hacker News

Replies