I’m not sure why, but just like with scp, I’ve achieved significant speeds ups by tarring the directory first (optionally compressing it), transferring and then decompressing. Maybe because it makes the tar and submit, and the receive, untar/uncompress, happen on different threads?
It's typically a disk-latency thing, as just stat-ing the many files in a directory can have significant latency implications (especially on spinning HDDs) vs opening a single file (the tar) and read-()ing that one file in memory before writing to the network.
If copying a folder with many files is slower than tarring that folder and the moving the tar (but not counting the untar) then disk latency is your bottleneck.
One of my "goto" tools is copying files over a "tar pipe". This avoids the temporary tar file. Something like: