As you allude, the prompt processing speeds are a killer improvement of the Spark which even 2 Strix Halo boxes would not match.
Prompt processing is literally 3x to 4x higher on GPT-OSS-120B once you are a little bit into your context window, and it is similarly much faster for image generation or any other AI task.
Plus the Nvidia ecosystem, as others have mentioned.
One discussion with benchmarks: https://www.reddit.com/r/LocalLLaMA/comments/1oonomc/comment...
If all you care about is token generation with a tiny context window, then they are very close, but that’s basically the only time. I studied this problem extensively before deciding what to buy, and I wish Strix Halo had been the better option.
Could I get your thoughts on the Asus GX10 vs. spending on GPU compute? It seems like one could get a lot of total VRAM with better memory bandwidth and make PCIe the bottleneck. Especially if you already have a motherboard with spare slots.
I'm trying to better understand the trade offs, or if it depends on the workload.
Then again, I have a RTX 5090 + 96GB DDR5-6000 that crushes the spark on prompt processing of gpt-oss-120b (something like 2-3x faster), while token generation is pretty close. The cost I paid was ~$3200 for the entire computer. With the currently inflated RAM prices, it would probably be closer to the dell.
So while I think the Strix Halo is a mostly useless machine for any kind of AI, and I think the spark is actually useful, I don't think pure inference is a good use case for them.
It probably only makes sense as a dev kit for larger cloud hardware.