logoalt Hacker News

fc417fc802last Wednesday at 10:47 AM2 repliesview on HN

With 2 IO dies aren't there effectively 2 meta NUMA nodes with 4 leaf nodes each? Or am I off base there?

The above doesn't even consider the possibility of multi-CPU systems. I suspect the existing programming models are quickly going to become insufficient for modeling these systems.

I also find myself wondering how atomic instruction performance will fare on these. GPU ISA and memory model on CPU when?


Replies

DiabloD3last Wednesday at 11:58 AM

If you query the NUMA layout tree, you have two sibling hw threads per core, then a cluster of 8 or 12 actual cores per die (up to 4 or 8 dies per socket), then the individual sockets (up to 2 sockets per machine).

Before 8 cores per die (introduced in Zen 3, and retained in 4, 5 and 6), the Zen 1/+ and 2 series this would have been two sets of four cores instead of one set of eight (and a split L3 instead of a unified one). I can't remember if the split-CCX had its own NUMA layer in the tree or not, or if they were just iterated in pairs.

show 1 reply
fweimerlast Wednesday at 6:18 PM

There should be plenty of existing programming models that can be reused because HPC used single-image multi-hop NUMA systems a lot before the Beowulf clusters took over.

Even today, I think very large enterprise systems (where a single kernel runs on a single system that spans multiple racks) are built like this, too.