• 0 Posts
  • 9 Comments
Joined 3 days ago
cake
Cake day: February 2nd, 2026

help-circle


  • We didn’t have trouble getting Datasets 10, 11, or 12. I think Dataset 9 was probably delivered fine on Friday, so the NYTimes was able to grab a complete copy. Then, NYTimes started reporting the abusive material, which prompted the DOJ to yoink the ZIP, and it’s been screwy ever since.

    I saw a post from a random Redditor confirming that they found abusive material, if that’s the concern. I doubt that the reports are fabricated, but I also agree that the reports are a great excuse for the DOJ to remove legitimate files.




  • It’s “great” that the DOJ removed CSAM at the same time as they were removing perfectly legitimate files that are in the public interest. That’s just really smart. Puts us all in a hell of a bind.

    I can’t speak for others, but I’ll plan to preserve the 87GB Set 9, the 90GB Set 9, and Set 10, until we can get an updated “complete” (current) Set 9 that can be presumed to be free of CSAM. After that, we can try to identify the legitimate files that are missing from the “complete” Set 9, and preserve those while purging the CSAM.


  • Amazing - Once you have the 180GB Set 9 downloaded, I’ll seed.

    At this point, my working assumption is that the version you’re downloading should be presumed to be free of CSAM, but we can’t know for sure until we check it. In addition, I assume that legitimate files were also removed from the version you’re downloading, but the legitimate files are preserved in the archives we already have (along with, tragically, the CSAM.)

    I think that after you download the 180GB set, we should compare it to our existing files to identify files that were removed. Then, we can identify which of the removed files were CSAM, and which of the removed files were legitimate. Going to be a hell of a task…