4x faster is about token prefill, i.e. the time to first token. It should be on par with DGX Spark t...

storus • today at 2:38 PM • 0 replies • view on HN

4x faster is about token prefill, i.e. the time to first token. It should be on par with DGX Spark there while being slightly faster than M4 for token generation. I.e. when you have long context, you don't need to wait 15 minutes, only 4 minutes.

alt Hacker News