> I run it all the time, token generation is pretty good. I feel like because you didn't a...

embedding-shape • yesterday at 9:10 PM • 1 reply • view on HN

> I run it all the time, token generation is pretty good.

I feel like because you didn't actually talk about prompt processing speed or token/s, you aren't really giving the whole picture here. What is the prompt processing tok/s and the generation tok/s actually like?

Replies

storus • yesterday at 9:21 PM

I addressed both points - I mentioned you can offload token prefill (the slow part, 9t/s) to DGX Spark. Token generation is at 6t/s which is acceptable.

alt Hacker News

Replies