A long time ago I worked with someone who read 1 byte at a time from a socket because they insisted data was cached so the kernel was going to batch it magically somehow. It took me days to convince them to measure it.
That's different: you're talking about the application code, like OP.
But I think the parent comment's point is that the issue is in the implementation of fread itself in the standard library. It's perfectly reasonable for an application to pass it 1, 65536 (i.e. one byte, up to 65536 times) and expect it not to issue 65536 separate OS calls.
I used to make it a general rule to start all my optimisation of any network code by running strace and look for excessive read's and write's, because you'd be shocked how many did stuff like that if they didn't know the length of a string, or to read the length first, instead of reading into a buffer.
I had to convince people with benchmarks regularly that, yes, you could write the handful of lines to do proper user-space buffering and trivially run rings around any code that did extra context switches, because a lot of people didn't realise the cost difference between system calls and calling their own functions.
This included, by the way, the MySQL client library, at one point, which would do small read for length fields instead of larger non-blocking reads into a buffer all the time