This is a very interesting strategy that might pay off. This model is a very good option for enterpr...

syntaxing • yesterday at 5:00 PM • 2 replies • view on HN

This is a very interesting strategy that might pay off. This model is a very good option for enterprise self host. I would argue a lot of companies are VRAM constrained rather than compute constrained. You could fit 4-5 running instances on one H100 cluster where you can only fit 1-2 Kimi K2 or GLM5.

Replies

2001zhaozhao • yesterday at 5:47 PM

This is 128B dense though. the K/V cache on long context is going to be massive

➕ show 2 replies

sayYayToLife • yesterday at 6:23 PM

[dead]

alt Hacker News

Replies