logoalt Hacker News

maleldilyesterday at 6:50 PM1 replyview on HN

Assuming 1 token per second and "overnight" being 12 hours, that's 43 200 tokens. I'm not sure what you can meaningfully achieve with that.


Replies

zozbot234yesterday at 9:02 PM

Sure, but if long-term throughput is a real limitation there's plenty of ways to address that while still not needing to keep anywhere close to all model weights in RAM (which is still the conventional approach with MoE). So the gain of a smaller memory footprint is quite real.