We backed up Spotify (metadata and music files). It’s distributed in bulk torrents (~300TB). It’s the world’s first “preservation archive” for music which is fully open (meaning it can easily be mirrored by anyone with enough disk space), with 86 million music files, representing around 99.6% of listens.
I’m not sure how they would go about doing that at scale without also getting some false positives and removing human music too
You could cut off your search around the time AI tracks started to appear. Not sure when that was, maybe 2023. You’d miss a lot of recent stuff, but you’d filter out a lot of spam too