I addressed both points - I mentioned you can offload token prefill (the slow part, 9t/s) to DG...

storus • yesterday at 9:21 PM • 0 replies • view on HN

I addressed both points - I mentioned you can offload token prefill (the slow part, 9t/s) to DGX Spark. Token generation is at 6t/s which is acceptable.

alt Hacker News