This requires hardware in the tens of thousands of dollars (if we want the tokens spit out at a reas...

master_crab • 01/04/2026 • 1 reply • view on HN

This requires hardware in the tens of thousands of dollars (if we want the tokens spit out at a reasonable pace).

Maybe in 3-5 years this will work on consumer hardware at speed, but not in the immediate term.

Replies

vntok • 01/04/2026

$2000 will get you 30~50 tokens/s on perfectly usable quantization levels (Q4-Q5), taken from any one among the top 5 best open weights MoE models. That's not half bad and will only get better!

➕ show 3 replies

alt Hacker News

Replies