logoalt Hacker News

sargunlast Sunday at 4:20 PM5 repliesview on HN

I really like the manycores approach, but we haven’t seen it come to fruition — at least not on general purpose machines. I think a machine that exposes each subset of cores as a NUMA node and doesn’t try to flatten memory across the entire set of cores might be a much more workable approach. Otherwise the interconnect becomes the scaling limit quickly (all cores being able to access all memory at speed).

Erlang, at least the programming model, lends itself well to this, where each process has a local heap. If that can stay resident to a subsection of the CPU, that might lend itself better to a reasonably priced many core architecture.


Replies

toast0last Sunday at 5:43 PM

> think a machine that exposes each subset of cores as a NUMA node and doesn’t try to flatten memory across the entire set of cores might be a much more workable approach. Otherwise the interconnect becomes the scaling limit quickly (all cores being able to access all memory at speed).

Epyc has a mode where it does 4 numa nodes per socket, IIRC. It seems like that should be good if your software is NUMA aware or NUMA friendly.

But most of the desktop class hardware has all the cores sharing a single memory controller anyway, so if you had separate NUMA nodes, it wouldn't reflect reality.

Reducing cross core communication (NUMA or not) is the key to getting high performance parallelism. Erlang helps because any cross process communication is explicit, so there's no hidden communication as can sometimes happen in languages with shared memory between threads. (Yes, ets is shared, but it's also explicit communication in my book)

zozbot234last Sunday at 4:53 PM

> Erlang, at least the programming model, lends itself well to this, where each process has a local heap.

That loosely describes plenty of multithreaded workloads, perhaps even most of them. A thread that doesn't keep its memory writes "local" to itself as much as possible will run into heavy contention with other threads and performance will suffer a lot. It's usual to try and write multithreaded workloads in a way that tries to minimize the chance of contention, even though this may not involve a literal "one local heap per core".

show 1 reply
felixgallolast Sunday at 5:14 PM

Paraphrasing the late great Joe Armstrong, the great thing about Erlang as opposed to just about any other language is that every year the same program gets twice as fast as last year.

Manycores hasn't succeeded because frankly the programming model of essentially every other language is stuck in 1950. I, the program, am the entire and sole thing running on this computer, and must manually manage resources to match its capabilities. Hence async/await, mutable memory, race checkers, function coloring, all that nonsense. If half the effort spent straining to get the ghost PDP-11 ruling all the programming languages had been spent on cleaning up the (several) warts in the actor model and its few implementations, we'd all be driving Waymos on Jupiter by now.

show 2 replies
to11mtmlast Sunday at 9:30 PM

> Erlang, at least the programming model, lends itself well to this, where each process has a local heap. If that can stay resident to a subsection of the CPU, that might lend itself better to a reasonably priced many core architecture.

I tend to agree.

Where it gets -really- interesting to think about, are concepts like 'core parking' actors of a given type on specific cores; e.x. 'somebusinessprocess' actor code all happens on a specific fixed set of cores and 'account' actors run on a different fixed set of cores, versus having all the cores going back and forth between both.

Could theoretically get a benefit due to instruction cache being very consistent per core, giving benefits due to the mechanical sympathy (I think Disruptors also take advantage of this).

On the other hand, it may not be as big a benefit, in the sense that cross process writes are cross core writes and those tend to lead to their own issues...

fun to think about.

show 1 reply
leoclast Sunday at 6:01 PM

Who knows what will really happen, but there have been rumours of significant core-count bumps in Ryzen 6, which would edge the mainstream significantly closer to manycore.