How would a modern OS implement this?
You do not need any OS changes, you just need a print library that does buffering correctly.
Buffering should basically always be: “Work or Time” based, either you buffered enough or enough time has passed. This is because you buffer when per-element latency starts bottlenecking your throughput.
If you have so little data that your throughput is not getting limited, then you should be flushing.
Probably by not assuming terminals and byte streams any more. Terminal-by-default is a 20th-century-ism. Now you have screens with pixels. Without stdout, no need to know if stdout is a terminal.
> How would a modern OS implement this?
fwrite only buffers because write is slow.
make it so write isn't slow and you don't need userspace buffering!