>Therefore if parallelising code reduces the runtime of that code, it is almost always more energy efficient to do so
Only if it leads to better utilisation. But in the scenario that the parent comment suggests, it does not lead to better utilisation as all cores are constantly busy processing requests.
Throughput as well as CPU time across cores remains largely the same regardless of whether or not you paralellise individual programs/requests.
That's true, which is why I added the caveat that this is only true if parallelising reduces the overall runtime - if you can get in more requests per second through parallelisation. And the flip side of that is that if you're able to perfectly utilise all cores then you're already running everything in parallel.
That said, I suspect it's a rare case where you really do have perfect core utilisation.