You do not need any OS changes, you just need a print library that does buffering correctly.
Buffering should basically always be: “Work or Time” based, either you buffered enough or enough time has passed. This is because you buffer when per-element latency starts bottlenecking your throughput.
If you have so little data that your throughput is not getting limited, then you should be flushing.