I addressed both points - I mentioned you can offload token prefill (the slow part, 9t/s) to DGX Spark. Token generation is at 6t/s which is acceptable.