The Unified Memory pool is what will continue to be the “game changer” in systems architecture, especially outside of data centers.
The reality is even cutting edge games and consumer workloads don’t actually take full use of the PCIe bandwidth of the GPU or the bandwidth of its GDDR memory. Even local AI use cases don’t substantially or meaningfully benefit from faster memory, at least to average consumers.
A unified memory pool does two things:
1) Lets systems optimize utilization based on need, rather than be confined to specific pools
2) Reduce overall memory cost, by letting system builders purchase a single type of memory in bulk instead of having to figure out GDDR vs DDR memory placement (important for SFF/portable machines)
So at a time when memory is expensive, unified pools make more sense. Even when memory becomes cheap and plentiful again, it’s just practical at this point to allocate a larger overall pool instead of managing discrete sets.
The one big drawback is security. A shared memory pool means side-channel attacks against memory from the GPU or CPU could potentially compromise the other as well, meaning memory-safe designs are going to be critical to security going forward (which is good for Rust adherents, I figure).
And here I am with 128GB Strix Halo longingly eyeing the Blackwell cards that spit tokens 10-20x the speed.
The question is ultimate shape of knowledge compression and bandwidth optimization at which we arrive I suppose.
Memory safety is orthogonal to side-channels, and hardware-enforced isolation (e.g. IOMMU) is more powerful than compiler-enforced isolation (but both are good!)
If this thing only has as much gpu bandwidth as the spark, it’s kinda pointles
Unified memory is only a feature because NVidia so aggressively uses VRAM for market segmentation.
The 5090 ($2k MSRP but realistically $3-3.5k) is almost the same as the RTX 6000 Pro (~$10k). Same memory bandwidth (1800GB/s). Slightly different CUDA cores (21k vs 24k). Big difference? VRAM (32GB vs 96GB).
NVidia ultimately doesn't want to upset this segmentation so the RTX Spark will never undermine their other offerings. This is why I think Apple has a real market opportunity if they choose to embrace it.
Intel was doing UMA with their i740 graphics in the late 90s. Codename TIMNA was cancelled, but they pioneered it and used it on their you/cpu chips as well as their breakthrough 810 chipset that dominated graphics market for a decade. It was despised because it wa ubiquitous and a low performing graphics engine but games had to accommodate it.
Funny that it is getting credit only now.
yeah, you only see double digits in performance degradation from going from pcie 5 to 3 with a 5090 (at x16 speed), with everything else its like in the single digits area.
> (which is good for Rust adherents, I figure).
As a Rust adherent, please do not put words in our mouths or set up unrealistic expectations for other people by linking together concepts at a very shallow level.
Language level memory safety has no answer for hardware security flaws which is what side channel attacks are. No programming language can provide memory privacy if another chip in your machine can read your memory. Just like no programming language can protect your application from a kernel vulnerability of the kernel it’s running on.
What is the difference between unified memory and shared memory?
Shared memory existed since the first CPU with an embedded GPU came to market and you could set in BIOS how much memory goes to what component.
I do have an opinion about how unified memory could be different, but I want a proper explanation.