logoalt Hacker News

CraigJPerry10/12/20241 replyview on HN

Is this because of NUMA or is it L2 cache or something entirely different?

I worked on high perf around 10 years ago and at that point I would pin the OS and interrupt handling to a specific core so I’d always lose one core. Testing led me to disable hyperthreading in our particular use case, so that was “cores” (really threads) halfed.

A colleague had a nifty trick built on top of solarflare zero copy but at that time it required fairly intrusive kernel changes, which never totally sat well with me, but again I’d lose a 2nd core to some bookkeeping code that orchestrated that.

I’d then tasksel the app to the other cores.

NUMA was a thing by then so it really wasn’t straightforward to eek maximum performance. It became somewhat of a competition to see who could get highest throughout but usually those configurations were unusable due to unacceptable p99 latencies.


Replies

afr0ck10/12/2024

NUMA gives you more bandwidth at the expense of higher latency (if not managed properly).