If the focus is performance, why use a separate process and have to deal with data serialization overhead?
Why not a typical shared library that can be loaded in python, R, Julia, etc., and run on large data sets without even a memory copy?
Perhaps because the performance is good enough and this approach is much simpler and portable than shared libraries across platforms.
This lets you not even need Python, r, Julia, etc but directly connect to your backend systems that are presumably in a fast language. If Python is in your call stack then you already don’t care about absolute performance.