A quick benchmark using float32 copies using torch cuda->cuda copies, comparing some random machi...

banana_giraffe • today at 1:52 AM • 1 reply • view on HN

A quick benchmark using float32 copies using torch cuda->cuda copies, comparing some random machines:

    Raptor Lake + 5080: 380.63 GB/s
    Raptor Lake (CPU for reference): 20.41 GB/s
    GB10 (DGX Spark): 116.14 GB/s
    GH200: 1697.39 GB/s

This is a "eh, it works" benchmarks, but should give you a feel for the relative performance of the different systems.

In practice, this means I can get something like 55 tokens a sec running a larger model like gpt-oss-120b-Q8_0 on the DGX Spark.

ekropotin • today at 2:20 AM

Nice! Thanks for that.

55 t/s is much better than I could expect.

alt Hacker News