logoalt Hacker News

storusyesterday at 9:21 PM0 repliesview on HN

I addressed both points - I mentioned you can offload token prefill (the slow part, 9t/s) to DGX Spark. Token generation is at 6t/s which is acceptable.