logoalt Hacker News

gorbachevtoday at 9:46 AM2 repliesview on HN

Quoting from their page:

--------------

This is by far the largest music metadata database that is publicly available. For comparison, we have 256 million tracks, while others have 50-150 million. Our data is well-annotated: MusicBrainz has 5 million unique ISRCs, while our database has 186 million.

--------------

If they truly are on a mission to protect world's information from disappearing, they should work with MusicBrainz to get this data on it.

Alternatively, it would be amazing, if they built a MusicBrainz like service around it.

In either case, to make the data truly useful, they'd need to solve the problem on how to match the metadata to a fingerprint used to identify the music tracks, assuming that data is not part of the metadata they collected.


Replies

aerozoltoday at 7:32 PM

It would be reasonably trivial to set up a bot that mass-imports metadata from Spotify to MusicBrainz (note that MB rules do not allow this, community cleanup from a single user doing this with another source, years ago, is still ongoing).

The value that MusicBrainz adds is the community editor who spent a few hours going through YouTube videos and wayback machine social links to figure out that Fog (Wellington, NZ, punk/post-punk) and Fog (Auckland, NZ, Post-Punk) are different bands - even if they share a Spotify profile. The editor that hunted down and listened to 5 compilations that have mixed up a radio edit and an original mix of a track, to find out which is which, and separate them in MB and make notes. [these are made up examples]

That's not to imply that these two projects are 'competing', or that the ISRC figure comparison isn't useful and correct. But community database + scraped data is apples and oranges. And a mixed fruit bowl is wonderful.

show 1 reply
47282847today at 12:10 PM

> n either case, to make the data truly useful, they'd need to solve the problem on how to match the metadata to a fingerprint used to identify the music tracks

How is that a problem?

    for each track in collection do extract_fingerprint