This is insane.
I definitely was not aware Spotify DRM had been cracked to enable downloading at scale like this.
The thing is, this doesn't even seem particularly useful for average consumers/listeners, since Spotify itself is so convenient, and trying to locate individual tracks in massive torrent files of presumably 10,000's of tracks each sounds horrible.
But this does seem like it will be a godsend for researchers working on things like music classification and generation. The only thing is, you can't really publicly admit exactly what dataset you trained/tested on...?
Definitely wondering if this was in response to desire from AI researchers/companies who wanted this stuff. Or if the major record labels already license their entire catalogs for training purposes cheaply enough, so this really is just solely intended as a preservation effort?
Flippant response: If it's ok for Meta for commercial use, why not for researchers for legitimate research work?
More serious response: research is explicitly included in fair use protections in US copyright law. News organizations regularly use leaked / stolen copyrighted material in investigative journalism.
The metadata is probably more useful than the music files themselves arguably
> The thing is, this doesn't even seem particularly useful for average consumers/listeners, since Spotify itself is so convenient, and trying to locate individual tracks in massive torrent files of presumably 10,000's of tracks each sounds horrible.
Are you aware Annas Archive already solved the exact same problem with books?
> this doesn't even seem particularly useful for average consumers/listeners
I can imagine this making it wayyy easier to build something like Lidarr but for individual tracks instead of albums.
>The thing is, this doesn't even seem particularly useful for average consumer
it's an archive to defend against Spotify going away. Remember when Netflix had everything, and then that eroded and now you can only rely on stuff that Netflix produced itself?
the average consumer will flock when Spotify ultimately enshitifies
>I definitely was not aware Spotify DRM had been cracked to enable downloading at scale like this.
What's stopping someone from sticking a microphone next to their speaker?
Slow, but effective.
The first users of this dataset will be Big Tech corps. Meta, Alphabet, OpenAI, Microsoft, Apple will all be happy to use this dataset for training their LLMs.
For them, 300TB is just cheap
This leak will also be really useful to bad actors who will resell the music from this list without paying royalties to the artists.
> Spotify itself is so convenient, and trying to locate individual tracks in massive torrent files of presumably 10,000's of tracks each sounds horrible.
Download the lot to a big Nas and get Claude to write a little fronted with song search and auto playlist recommendations?
I dunno if they publish like a 10 TB torrent of the most popular music I can see people making their own music services. A 10 TB hard disk is easily affordable, and that's about 3 million songs which is way more than anyone could listen to in a lifetime, even if you reduce that by 100x to account for taste.
It's probably going to make the AI music generation problem worse anyway...
Just cite facebook getting busted training its AI on torrents proven to contain unlicensed material lol
DRM aside, Spotify clearly should have logic that throttles your account based on requests (only so many minutes in a day..), making it entirely impractical to download the entirety of it unless you have millions of accounts.
> I definitely was not aware Spotify DRM had been cracked to enable downloading at scale like this.
Do they have DRM at all? Youtube and Pandora don't.
>> But this does seem like it will be a godsend for researchers working on things like music classification and generation. The only thing is, you can't really publicly admit exactly what dataset you trained/tested on...?
Didn't Meta already publicly admit they trained their current models on pirated content? They're too big to fail. I look forward to my music Slop.
Just like with anything digital you (and Spotify) are fully at the mercy of the rights holders. When (not if) they pull their stuff, or replace their stuff, or change their stuff, you can never get the original back unless you preserve it.
Largest example: a lot of Russian music is not available on Spotify because of the Russia-Ukrane war, and Spotify pulling out of Russia. So they don't have the licneses to a lot of stuff because that belongs to companies operating within Russia.
Id be stunned if we didn't find out Anna's Archive is a front for a handful of shadier VCs who are into AI. Even if AA themselves don't know it and just take the cash.
Thank god we are taking care of the “researchers working on things like music classification and generation” ! As long as we can convince ourselves we have a sound analysis of it, no need to support and defend people making actual art right. So much already made, who needs more?
This is not to defend Spotify (death to it), but to state that opening all of this data for even MORE garbage generation is a step in the wrong direction. The right direction would be to heavily legislate around / regulate companies like Spotify to more fairly compensate the musicians who create the works they train their slop generators with.
> The thing is, this doesn't even seem particularly useful for average consumers/listeners
Yeah. To me it is not really relevant. I actually was not using spotify and if I need to have songs I use ytldp for youtube but even that is becoming increasingly rare. Today's music just doesn't interest me as much and I have the songs I listen to regularly. I do, however had, also listen to music on youtube in the background; in fact, that is now my primary use case for youtube, even surpassing watching movies or anything else. (I do use youtube for getting some news too though; it is so sad that Google controls this.)
> The thing is, this doesn't even seem particularly useful for average consumers/listeners, since Spotify itself is so convenient, and trying to locate individual tracks in massive torrent files of presumably 10,000's of tracks each sounds horrible.
I wouldn’t be so sure. There are already tools to automatically locate and stream pirated TV and movie content automatic and on demand. They’re so common that I had non-technical family members bragging at Thanksgiving about how they bought at box at their local Best Buy that has an app which plays any movie or TV show they want on demand without paying anything. They didn’t understand what was happening, but they said it worked great.
> Definitely wondering if this was in response to desire from AI researchers/companies who wanted this stuff.
The Anna’s archive group is ideologically motivated. They’re definitely not doing this for AI companies.