That's different: you're talking about the application code, like OP.
But I think the parent comment's point is that the issue is in the implementation of fread itself in the standard library. It's perfectly reasonable for an application to pass it 1, 65536 (i.e. one byte, up to 65536 times) and expect it not to issue 65536 separate OS calls.
Is it? I get what you're saying, but asking for 1 byte 65536 times, is indeed different than asking for 65536 bytes, 1 time. There may be reasons, such as when you pull off the end of a buffer, it shifts. And the buffer size is 1 byte. Or 10. Or whatever.
No, I'm not saying that's why. I'm simply saying there is a difference between asking for 1 byte or 65k bytes of something. Even dd runs the same under Linux.
dd bs=10k count=1 is faster than bs=1 count=10k
I remember trying to recover some data from a spinning disk, and trying to slowly creep up on the data. So I wanted 1 byte per, I wanted it to nibble, until it hit whatever the errored part was. If I just grabbed the lot, it'd error out from the whole read.