I didn't see this in the article but elsewhere I've seen the memory bandwidth quoted as 600GB/s [1]. For comparison:
- 5090/6000 Pro: 1792GB/s
- 5080:: 960GB/s
- 5070Ti: 892GB/s
- M3 Ultra: 819GB/s
- DGX Spark: 273GB/s (less than an M5 Pro at 307GB/s)
Memory bandwidth isn't everything but it will cap inference rate pretty heavily. Also, the M3 Ultra is for an almost 2 year old Mac Studio. It's widely expected that it'll be refreshed in Q3 with a likely M5 or M4 Ultra with >1000GB/s. I really hope Apple realizes what a market opportunity Apple has here.
The above shows just how good value the 5090 really is. It basically is a stripped down rTX 6000 Pro, which is a ~$10k card, for 20-30% of the price. This also demonstrates how NVidia uses VRAM for market segmentation. As an aside, the true data center cards (eg B100, H100) use HBM memory at ~3.2TB/s.
[1]: https://wccftech.com/nvidia-enters-pc-space-with-rtx-spark/
Yeah and also the quoted 1 PF is only for sparse models (only half that for dense, if that), and the DGX had serious hardware issues: https://x.com/ID_AA_Carmack/status/1982831774850748825