I run it all the time, token generation is pretty good. Just large contexts are slow but you can hoo...

storus • yesterday at 8:03 PM • 1 reply • view on HN

I run it all the time, token generation is pretty good. Just large contexts are slow but you can hook a DGX Spark via Exo Labs stack and outsource token prefill to it. Upcoming M5 Ultra should be faster than Spark in token prefill as well.

Replies

embedding-shape • yesterday at 9:10 PM

> I run it all the time, token generation is pretty good.

I feel like because you didn't actually talk about prompt processing speed or token/s, you aren't really giving the whole picture here. What is the prompt processing tok/s and the generation tok/s actually like?

➕ show 1 reply

alt Hacker News

Replies