I wouldn’t say runs. More of a gentle stroll.

redrove • yesterday at 7:42 PM • 2 replies • view on HN

Replies

I run it all the time, token generation is pretty good. Just large contexts are slow but you can hook a DGX Spark via Exo Labs stack and outsource token prefill to it. Upcoming M5 Ultra should be faster than Spark in token prefill as well.

➕ show 1 reply

hasperdi • yesterday at 8:30 PM

With quantization, converting it to an MOE model... it can be a fast walk

alt Hacker News

Replies