• InputZero@lemmy.world
      link
      fedilink
      arrow-up
      2
      arrow-down
      2
      ·
      1 day ago

      Discord’s complete lack of indexing. Although it’s definitely not impossible to scrape data from Discord it would take more resources than say reddit.

      • plyth@feddit.org
        link
        fedilink
        English
        arrow-up
        19
        ·
        1 day ago

        If an AI company pays Discord they won’t scrape but get the data directly.

      • RepleteLocum@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        17
        ·
        1 day ago

        But they Index everything. Just request your data and you’ll get a neat package of all your messages with timestamps and all.

          • kadu@lemmy.world
            link
            fedilink
            arrow-up
            10
            ·
            1 day ago

            So what? You can still sell it to AI companies without assigning an user to each message. They don’t care about who wrote it when stealing the content.

  • vane@lemmy.world
    link
    fedilink
    arrow-up
    24
    arrow-down
    1
    ·
    1 day ago

    You forgot about influencers who will read your knowledge and present as theirs in their videos.

  • Uriel238 [all pronouns]@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    5
    ·
    1 day ago

    Let them scrape. AI as it currently is, is still autocomplete with extra steps, and still prone to hallucination. As it is it will be usable to make cheap, passable content, but not hit those moments of inspiration of human art (yet – there are real AI groups looking to make AGI)

    It is a bubble which will pop and AI will be seen as a tool (a resource-costly tool) that requires its own set of experts independent from the experts that use ACAD or write editorial copy or do investigative work. Id est, it’s not the replacement of employees that boards of directors want it to be.

    And AGI is centuries from being efficient enough that you can make Rosie the Robot who cleans your house and makes a good upside-down pineapple cake.

    • Limonene@lemmy.world
      link
      fedilink
      arrow-up
      17
      ·
      2 days ago

      Even if Discord wasn’t doing it, public Discord guilds are known to be scraped by a number of different bots. Previously, it was for spies, cops, and private investigators who wanted to search for messages by username. If those bots could do it before, AI bots will be doing it aggressively today.

    • skisnow@lemmy.ca
      link
      fedilink
      English
      arrow-up
      31
      ·
      2 days ago

      Yeah. The vinegar is rich in hydrocarbons, which improve the fuel/air ratio during combustion whilst also keeping the engine smelling nice.

  • dual_sport_dork 🐧🗡️@lemmy.world
    link
    fedilink
    English
    arrow-up
    104
    arrow-down
    1
    ·
    2 days ago

    Counter offer: Be a huge nerd and hang out on Lemmy instead.

    You’ll probably be scraped by AI bots anyway, but we have penguins and Star Trek memes. And knives.

      • Chozo@fedia.io
        link
        fedilink
        arrow-up
        33
        arrow-down
        1
        ·
        2 days ago

        @[email protected] does a weekly-ish post in [email protected] called Weird Knife Wednesday, where he talks about a weird knife from his collection. His reviews are often hilarious, sometimes heartwarming, and always entertaining. Even people who aren’t knife nerds pop into his posts each week. Definitely worth reading them! He’s posting some of the best original content on Lemmy right now, IMO.

        • AnarchistArtificer@slrpnk.net
          link
          fedilink
          English
          arrow-up
          13
          ·
          2 days ago

          See, this is why I love being here — random, delightful stuff like this makes me feel more connected to strangers who I will never meet, which genuinely helps to fuel my overall sense of purpose in fighting for a better world (and in many cases, in just fighting to continue existing throughout grimness). Thanks for the recommendation

          Another person who comes to mind in this vein is the wonderful person who posts lots of cool owl content on the superbowl community (their username starts with anon, I think. Someone who knows how to tag users on Lemmy, feel free to tag them if you know who I mean)

          • Vupware@lemmy.zip
            link
            fedilink
            arrow-up
            7
            ·
            2 days ago

            What I really appreciate about Lemmy is that broadly there is an unspoken rule that constructive dialogue is the only option.

            You can say something stupid or misinformed, and instead of ripping you to shreds or vilifying you, the fellow strangers that choose to respond will usually do so in a polite, constructive way. They will put effort into their argument to make sure it’s understood and sound.

            Once that unwritten rule is no longer abided by, the ship has already left the port and there’s no recovering. I hope it stays that way for the foreseeable future.

  • Rooty@lemmy.world
    link
    fedilink
    arrow-up
    32
    arrow-down
    3
    ·
    edit-2
    2 days ago

    IDGAF about LLM bots scraping public forums, they are public and available to anyone. I do mind them scraping shadow libraries, and training on copywritten material, which they should not do

    • Wawe@lemmy.world
      link
      fedilink
      arrow-up
      14
      ·
      2 days ago

      LLM bots are scraping so much that increases costs of maintaing forums and sometimes even ddosin them for example Codeberg.

    • mushroomman_toad@lemmy.dbzer0.com
      link
      fedilink
      arrow-up
      5
      ·
      2 days ago

      This discussion is a creative work and the copyright is collectively owned by the text contributors.

      Please reach out to the authors individually for a license before using it to train your AI sex bot.

      • LousyCornMuffins@lemmy.world
        link
        fedilink
        English
        arrow-up
        5
        ·
        1 day ago

        I hereby and in perpetuity grant an exclusive, non-geographically-limited license to my comments to F.I.S.T.O. and only F.I.S.T.O.

        not the makers of F.I.S.T.O. lets be clear

        • mushroommunk@lemmy.today
          link
          fedilink
          arrow-up
          3
          ·
          1 day ago

          That’s currently being argued in the courts. There’s a lot that goes into it from right to distribution, to proving that although the AI bot can’t reproduce everything even though it normally doesn’t. [https://arstechnica.com/features/2025/06/study-metas-llama-3-1-can-recall-42-percent-of-the-first-harry-potter-book/](A very real example of reproducibility)

          There’s also arguments about how they accessed large amounts of content. The law doesn’t just recognize whether you can access something or not, but what you access it for. There’s laws about accessing things with the sole purpose of using it to develop a commercial product. All of it is a tangled mess that there’s no current clear answer to (legally, morally I think there is but that’s very opinionated)

  • Rentlar@lemmy.ca
    link
    fedilink
    arrow-up
    24
    ·
    2 days ago

    If I’m going to share my information and knowledge publicly on an Internet site, I’d like everyone to have fair and open access to it, not at the whims of a multinational corp to gatekeep for me. So the fact that AI can access it too doesn’t discourage me.

    You have information from me because I choose to share it, not because a site has demanded I give it up without a clear benefit to me in return.

    • mushroommunk@lemmy.today
      link
      fedilink
      arrow-up
      4
      ·
      edit-2
      1 day ago

      I think there’s a lot of solid arguments against letting AI steal everything, but with the scraping there’s an even more immediate problem. They don’t rate limit or do it in an intelligent method. It becomes a full blown ddos that has take down entire sites and slowed many more to the point of near uselessness.

      They’re in a very literal sense crashing large chunks of the Internet and causing havoc which costs very real money to fix, either by upping server resources or installing AI scraping mitigation resources so that every still has access to the free information you mention.

      • Rentlar@lemmy.ca
        link
        fedilink
        arrow-up
        1
        ·
        1 day ago

        That is definitely a problem that needs to be dealt with, since AI scrapers hogging bandwidth or making sites inaccessible means it is hampering equal access to everyone. Ignoring conventions and not rate limiting itself are harmful to the open internet.

        So yes, those kinds of AI scraping behaviours should be mitigated, but on the principle of AI ingesting my public data, I’m not against it, if it can access it reasonably and fairly like anyone else.

    • Tar_Alcaran@sh.itjust.works
      link
      fedilink
      arrow-up
      9
      ·
      2 days ago

      My problem with it is that in Ye Olde Times before 2022, if you needed some info on, I dunno, amethist cutting blades, you joined the crystal geode cutting forum and maybe became a contributing member of the group.

      Now, you ask chatGPT, and contribute nothing.

  • Delphia@lemmy.world
    link
    fedilink
    arrow-up
    38
    arrow-down
    3
    ·
    2 days ago

    As someone who is enthusiastic about old cars the amount of knowledge that disappeared when forums got killed by fb is immeasurable. At least A.I might preserve some knowledge.

    When people die they take their knowledge with them if nobody writes it down and maintains it.

    • skisnow@lemmy.ca
      link
      fedilink
      English
      arrow-up
      29
      arrow-down
      2
      ·
      2 days ago

      At least A.I might preserve some knowledge.

      Big oof when you realize that literally nothing an AI tells you can be trusted, and you still have to find a proper source for it.

        • Tar_Alcaran@sh.itjust.works
          link
          fedilink
          arrow-up
          10
          ·
          2 days ago

          No.

          Or at least, not always. I’m in plenty of online groups with people who have shown their trustworthiness and expertise. They are people with a reputation.

        • Delphia@lemmy.world
          link
          fedilink
          arrow-up
          8
          arrow-down
          2
          ·
          2 days ago

          IKR. People these days dont realise that confidently incorrect people pre-exist facebook.

          If you blindly do what ChatGPT says you deserve what happens to you.

          • BarrelAgedBoredom@lemmy.zip
            link
            fedilink
            arrow-up
            6
            ·
            1 day ago

            Yeah but we automated the confidently incorrect idiot and every massive corporation is pushing the robo-idiot as a friend, confidant, tutor, assistant, and trustworthy source of accurate information. I’d rather have the confidently incorrect human than the lifeless simulacrum of one

          • Tar_Alcaran@sh.itjust.works
            link
            fedilink
            arrow-up
            10
            ·
            2 days ago

            People these days dont realise that confidently incorrect people pre-exist facebook.

            It’s different though.

            If you were a flat earther in 1982, you probably would have a weird self published “newspaper” by someone 4 times a year, and two or three books and no platform beyond literally shouting on the street at people who all considered you a moron.

            Nowadays, if you’re a crackpot, you can instantly find 17.000 other crackpots who will happily not just confirm your idiocy, but make up fake stories to support your bullshit ideas. They will also drag you along by pure crank magnetism into other bullshit. You can spread your bullshit far and wide, and since people are automatically served with similar content, you’re even likely to find other idiots like you “in the wild”, which is actually an algorithmic bubble.

            Before, nobody you met in real life would agree with you. Nowadays, everyone you “meet” online agrees with you.

            So yes, confidently incorrect people have always been there, but not in these numbers, and rarely to this level of confidence. That’s why people react to vehemently, they rarely ever reach outside their bubble. Your ideas that the world is round aren’t the general concept to them, they hear from flat earthers every single hour of the day.

            • merc@sh.itjust.works
              link
              fedilink
              arrow-up
              2
              ·
              1 day ago

              Nowadays, if you’re a crackpot, you can instantly find 17.000 other crackpots who will happily not just confirm your idiocy, but make up fake stories to support your bullshit ideas.

              And because crackpots like this are very engaged in their crackpottery, it’s a great place to put ads. That means that the big Internet ad companies all want to be the ones to host those bullshit ideas.

              Back in the day, the reason crackpot newspapers had to be self-published is that the big publishers didn’t want to have anything to do with the crackpots. But, in the modern world, Google / Meta can find someone who wants to run an add to your crackpottery, so you get the same treatment as a big media publisher. In fact, you might get better treatment because crackpottery may be stickier than say the Boston Globe, so Google / Meta might prefer to work with you because it allows them to show more ads.

        • skisnow@lemmy.ca
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          1
          ·
          2 days ago

          Sure but (and this goes to the other person who replied with much the same thing) there’s an order of magnitude of difference going on there, plus usually when someone says something wrong on a forum others usually show up to correct them.

          AI responses have so far been very clearly a step down in reliability, so don’t be treating it as a binary.

            • mushroommunk@lemmy.today
              link
              fedilink
              arrow-up
              2
              ·
              1 day ago

              A lot of the forums I’m seeing talked about where more technical or objective kinds. Like in a car forum there’d be repair manuals or parts lists, fountain pen forums would have loads of images comparing inks side by side for different shades and hues. Those are the sorts of knowledge centers being discussed and reminisced about a lot here.

              • Tja@programming.dev
                link
                fedilink
                arrow-up
                1
                ·
                16 hours ago

                Yeah, survivorship bias, we only remember the good ones, but there were plenty of shitpools out there.

      • Delphia@lemmy.world
        link
        fedilink
        arrow-up
        6
        arrow-down
        2
        ·
        edit-2
        2 days ago

        Yeah bigger oof when you realise that nothing damn near anyone who tells anyone anything can be trusted.

        Do you know how many times I’ve been handed the wrong part by “professionals” whose full time job is “parts interpreter” and their job description is to look up and order parts for customers? Or had a mechanic be “certain” about the cause of the same problem for the 3rd fucking time. The fact is that when I want to know which is the correct ecu pin for the crank angle sensor on an 83 Cordia Turbo thats some esoteric as fuck knowledge thats probably buried on a forum somewhere. If ChatGPT thinks it knows, I dont just wire shit up and send it. I get out the multimeter and I check that wire first.

        Dont get me wrong, if googles search wasnt rubbish these days A.I wouldnt be as useful as it is. I had to find out who made the rear diff for a car to see if we could pull the gears out from a different make/model to get better ratios for the strip. An hour of googling just turned up every result for people selling diffs, selling diff seals, selling diffs for other cars, workshops that specialise in diffs, diff seals for other cars… Chat GPT just fucking knew it was an Aisin unit and what its part number was and then I asked "What cars is “part number” used in and it spat out a list. Its only good because google is shit. If google was still great, it would merely be a novelty.

    • LousyCornMuffins@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      1 day ago

      i remember the databases we made of tablature on the guitar forums. i already had the songs we were playing, but… dammit i can’t even remember the name of the forum anymore it’s been like 30 years.

    • CXORA@aussie.zone
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      1
      ·
      2 days ago

      The problem is that AI strip’s all provenance. The most accurate information is presented exactly the same as absolute nonsense.

      It makes it exceedingly difficult to sift truth from fiction, without the context clues we could otherwise use online.