I wish it was open-weights so we could discuss the architectural changes. This model is about twice as fast as 4.1, ~60t/s Vs ~30t/s. Is it half the parameters, or a new INT4 linear sparse-moe architecture?