logoalt Hacker News

syntaxingyesterday at 5:00 PM2 repliesview on HN

This is a very interesting strategy that might pay off. This model is a very good option for enterprise self host. I would argue a lot of companies are VRAM constrained rather than compute constrained. You could fit 4-5 running instances on one H100 cluster where you can only fit 1-2 Kimi K2 or GLM5.


Replies

2001zhaozhaoyesterday at 5:47 PM

This is 128B dense though. the K/V cache on long context is going to be massive

show 2 replies
sayYayToLifeyesterday at 6:23 PM

[dead]