Its not a hash though? Its a reverse - fourier transform system that matches the sound- similar to the filter that filters out the vuvuzelas?
https://www.dechicchis.com/assets/Joseph-DeChicchis-Music-Id...
Like having a distinctive click impulse and get the cathedral from that.
Ctrl-F in that document for 'hashing'. That step reduces the audio information to a sparse collection of key points, one for each of four frequency ranges per time segment. I would assume that everything up to that step is done on the phone and only the key points are sent to the server.