I worked in parallel computing in the late 80s and early 90s when parallel languages were really a thing. in HPC applications memory bandwidth is certainly a concern, although usually the global communications bandwidth (assuming they are different) is the roofline. by saying c++ you're implying that MPI is really sufficient, and its certainly possible to prop up parallel codes with MPI is really quite tiresome and hard to play with the really interesting problem which is the mapping of the domain state across the entire machine.
other hugely important problems that c++ doesn't address are latency hiding, which avoids stalling out your entire core waiting for distributed message, and a related solution which is interleave of computation and communication.
another related problem is that a lot of the very interesting hardware that might exist to do things like RDMA or in-network collective operations or even memory-controller based rich atomics, aren't part of the compiler's view and thus are usually library implementations or really hacky inlines.
is there a good turnkey parallel language? no. is there sufficient commonality in architecture or even a lot of investment in interesting ideas that were abandoned because of cost, no. but there remains a huge potential to exploit parallel hardware with implicit abstractions, and I think saying 'just use c++' is really missing almost all of the picture here.
addendum: even if you are working on a single-die multicore machine, if you don't account for locality, it doesn't matter how good your code generator is, you will saturate the memory network. so locality is an important and languages like Chapel are explicitly trying to provide useful abstractions for you to manage it.