• Grimy@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    1
    ·
    22 hours ago

    That being said, sabotaging all future endeavors would likely just result in a soft monopoly for the current players, who are already in a position to cherry pick what they add. I wouldn’t be surprised if certain companies are already poisoning the well to stop their competitors tbh.

    • supersquirrel@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      8
      ·
      edit-2
      20 hours ago

      In the realm of LLMs sabotage is multilayered, multidimensional and not something that can easily be identified quickly in a dataset. There will be no easy place to draw some line of “data is contaminated after this point and only established AIs are now trustable” as every dataset is going to require continual updating to stay relevant.

      I am not suggesting we need to sabotage all future endeavors for creating valid datasets for LLMs either, far from it, I am saying sabotage the ones that are stealing and using things you have made and written without your consent.

      • Grimy@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        edit-2
        21 hours ago

        I just think the big players aren’t touching personal blogs or social media anymore and only use specific vetted sources, or have other strategies in place to counter it. Anthropic is the one that told everyone how to do it, I can’t imagine them doing that if it could affect them.

        • supersquirrel@sopuli.xyz
          link
          fedilink
          English
          arrow-up
          4
          ·
          edit-2
          21 hours ago

          Sure, but personal blogs, esoteric smaller websites and social media are where all the actual valuable information and human interaction happens and despite the awful reputation of them it is in fact traditional news media and associated websites/sources that have never been less trustable or useless despite the large role they still play.

          If companies fail to integrate the actual valuable parts to the internet in their scraping, the product they create will fail to be valuable past a certain point shrugs. If you cut out the periphery of the internet paradoxically what you accomplish is to cut out the essential core out of the internet.