> In the case of gpt-oss 120B that would means sqrt(5*120)=24B.
That's actually in line with what I had (unscientifically) expected. Claude Sonnet 4 seems to agree:
> The most accurate approach for your specific 120B MoE (5.1B active) would be to test it empirically against dense models in the 10-30B range.