I wish it was open-weights so we could discuss the architectural changes. This model is about twice ...

irthomasthomas • today at 9:16 PM • 0 replies • view on HN

I wish it was open-weights so we could discuss the architectural changes. This model is about twice as fast as 4.1, ~60t/s Vs ~30t/s. Is it half the parameters, or a new INT4 linear sparse-moe architecture?

alt Hacker News