I have a theory for this, but I don't know how I'd test for it (and I don't work in the field).
We have a time window within which audio stimulae are interpreted as being "the same sound". When you hear two impulses outside that window, they seem to be different sounds. You can play with this by looping two similar or copies of a sound and then varying their offset a few ms either way. They'll move in and out of seeming like the same and different sounds, and around the crossover point you get ringing effects (especially if there's more than two copies, such as with tight echoes).
To me, this seems like a fundamental part of music interpretation. Not the core, but very significant.
Also, different species have different time perceptions. (I mean, I'm kinda guessing, but they all have different heart rate ranges, attention spans, brain wave frequency ranges etc, all of which imply to me a varience in time perception). Our music makes sense against our time perception; we're quite sensitive to it... raise the bmp of a track by just 2 or 3 and it feels quite different. Change from 50% swing to 53% (or 52 if you're really sensitive) and your sense of the groove changes meaningfully. Pass all that through the "different perception of time" and it's easy to imagine our music means nothing to other species.
It also seems likely to me that:
a) most species have different sized windows
b) they perceive blends of frequencies quite differently depending on the window length
So, what seems like coherent, organised sound with a "story" or "meaning" or "structure" to us, probably becomes mush to most other species.
Then note that the different frequency ranges in which animals hear, the different ways their ears focus sound... etc. Us humans are creating organised sound around the biology of our auditory system; the perception of organisation is likely very different for most other species.
Just the difference between boom and bap, boom ... bap... boom... bap... tells us "something". but it's gonna tell you something different if you hear it as ttppssssss daaaaapppp ttpppssssss ddaaappppp.
Music is a hack.