> We still need good value hardware to run Kimi/GLM in-house If you stream weights in from...

zozbot234 • yesterday at 2:40 PM • 2 replies • view on HN

> We still need good value hardware to run Kimi/GLM in-house

If you stream weights in from SSD storage and freely use swap to extend your KV cache it will be really slow (multiple seconds per token!) but run on basically anything. And that's still really good for stuff that can be computed overnight, perhaps even by batching many requests simultaneously. It gets progressively better as you add more compute, of course.

Replies

Aurornis • yesterday at 6:32 PM

> it will be really slow (multiple seconds per token!)

This is fun for proving that it can be done, but that's 100X slower than hosted models and 1000X slower than GPT-Codex-Spark.

That's like going from real time conversation to e-mailing someone who only checks their inbox twice a day if you're lucky.

HPsquared • yesterday at 3:07 PM

At a certain point the energy starts to cost more than renting some GPUs.

➕ show 2 replies

alt Hacker News

Replies