logoalt Hacker News

redroveyesterday at 7:42 PM2 repliesview on HN

I wouldn’t say runs. More of a gentle stroll.


Replies

storusyesterday at 8:03 PM

I run it all the time, token generation is pretty good. Just large contexts are slow but you can hook a DGX Spark via Exo Labs stack and outsource token prefill to it. Upcoming M5 Ultra should be faster than Spark in token prefill as well.

show 1 reply
hasperdiyesterday at 8:30 PM

With quantization, converting it to an MOE model... it can be a fast walk