logoalt Hacker News

Galanwetoday at 7:06 PM1 replyview on HN

The 5090 is crap for inference. Unless you like dummy models, sure they will run at light speed. All the rage is MoE with 500B-1T weights nowadays.


Replies

zozbot234today at 8:40 PM

MoE is fine. You can put the shared weights on the 5090 (will fit handily even for the largest models) and expert weights on CPU, possibly with weights offload from storage.