> you can combine Spark with M3U, the former streaming the compute, lowering TTFT, the latter doi...

echion • today at 12:59 AM • 1 reply • view on HN

> you can combine Spark with M3U, the former streaming the compute, lowering TTFT, the latter doing the token generation part

Are you doing this with vLLM, or some other model-running library/setup?

coder543 • today at 1:02 AM

They're probably referencing this article: https://blog.exolabs.net/nvidia-dgx-spark/

alt Hacker News