I wonder if the at least partially the reason for the speed up isn't the multi-threading, but instead that rclone maybe doesn't compress transferred data by default. That's what rsync does when using SSH, so for already compressed data (like videos for example) disabling SSH compression when invoking rsync speeds it up significantly:
rsync -e "ssh -o Compression=no" ...
Compression is off by default in OpenSSH, at least `man 5 ssh_config` says:
> Specifies whether to use compression. The argument must be yes or no (the default).
So I'm surprised you see speedups with your invocation.