A cluster of 10 machines with 40 vCPUs in total would equate to 4 vCPUs per machine. I am not famili...

menaerus • 10/14/2024 • 0 replies • view on HN

A cluster of 10 machines with 40 vCPUs in total would equate to 4 vCPUs per machine. I am not familiar with Spark internals but in the realm of distributed databases such a setup would generally make no sense at all (to me). So I think you're correct that most of the overhead was caused by machine-to-machine byte juggling. 4 vCPUs is nothing.

I suspect you would be able to cut down the 2.5hr runtime dramatically even with the Spark if you just deployed it as a single instance on that very same 32vCPU machine.

alt Hacker News