In the same way that Shazam can identify songs despite the audio source being terrible over a phone, mixed with background noise. It doesn't capture the audio as a WAV and then scan its database for an exact matching WAV segment.
I'm sure it is way more complex than this, but shazam does some kind of small windowed FFT and distills it to the dominant few frequencies. It can then find "rhythms" of these frequency patterns, all boiled down to a time stream of signature data. There is some database which can look up these fingerprints. One given fingerprint might match multiple songs, but since they have dozens of fingerprints spread across time, if most of them point to the same musical source, that is what gets ID'd.