> think a machine that exposes each subset of cores as a NUMA node and doesn’t try to flatten me...

toast0 • last Sunday at 5:43 PM • 0 replies • view on HN

> think a machine that exposes each subset of cores as a NUMA node and doesn’t try to flatten memory across the entire set of cores might be a much more workable approach. Otherwise the interconnect becomes the scaling limit quickly (all cores being able to access all memory at speed).

Epyc has a mode where it does 4 numa nodes per socket, IIRC. It seems like that should be good if your software is NUMA aware or NUMA friendly.

But most of the desktop class hardware has all the cores sharing a single memory controller anyway, so if you had separate NUMA nodes, it wouldn't reflect reality.

Reducing cross core communication (NUMA or not) is the key to getting high performance parallelism. Erlang helps because any cross process communication is explicit, so there's no hidden communication as can sometimes happen in languages with shared memory between threads. (Yes, ets is shared, but it's also explicit communication in my book)

alt Hacker News