Of course you still need one binary per CPU architecture. But when you rely on a dynamic link, you need to build from the same architecture as the target system. At that point cross-compiling stops being reliable.
Is it some tooling issue? Why is is an issue to cross-compile programs with dynamic linking?
I happily and reliably cross build Go code that uses CGO and generate static binaries on amd64 for arm64.
I am complaining about the language (phrasing) used: a Python, TypeScript or Java program might be truly portable across architectures too.
Since architectures are only brought up in relation to dynamic libraries, it implied it is otherwise as portable as above languages.
With that out of the way, it seems like a small thing for the Go build system if it's already doing cross compilation (and thus has understanding of foreign architectures and executable formats). I am guessing it just hasn't been done and is not a big lift, so perhaps look into it yourself?