These large MoE models can work quite well on consumer or prosumer platforms, they'll just be s...

zozbot234 • today at 5:58 AM • 0 replies • view on HN

These large MoE models can work quite well on consumer or prosumer platforms, they'll just be slow, and you have to offset that by running them unattended around the clock. (Something that you can't really do with large SOTA models without spending way too much on tokens.) This actually works quite well for DeepSeek V4 series which has comparatively tiny KV-cache sizes so even a consumer platform can run big batches in parallel.

alt Hacker News