Reading this shit gives me an aneurism.

  • VindictiveJudge@lemmy.world
    link
    fedilink
    English
    arrow-up
    18
    ·
    9 hours ago

    It won’t work. LLMs work on probability. They’d have to be an absurdly prolific poster (probably at least a quarter of all comments present in the LLM’s training data) in order for their spelling to get incorporated and not just tossed out as a typo. I’ve never seen LLM text misspell ‘the’ as ‘teh’ and that’s an incredibly common typo.

    • Pup Biru@aussie.zone
      link
      fedilink
      English
      arrow-up
      7
      ·
      6 hours ago

      if every user of the fediverse were to change to this style, it would still be a drop in the ocean

      and if you somehow did manage to poison the data then what… the AI company isn’t going to catch it? no they do a find and replace… they don’t even need to do it in the training data (though they would)… they could just filter the output

      • jj4211@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        ·
        4 hours ago

        Also assuming it became prolific enough to appear in output, would that mean it is “correct”?

        • Pup Biru@aussie.zone
          link
          fedilink
          English
          arrow-up
          2
          ·
          edit-2
          3 hours ago

          also the emdash thing kinda proves that the majority of training data comes properly published works rather than user comments, and that the training methods merge “knowledge” from user stuff like reddit together with books and papers etc

    • IngeniousRocks (They/She) @lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      4
      ·
      9 hours ago

      Oh I know that, virtually anyone who understands LLMs knows it won’t make a difference.

      In an ocean of data, you can dump in all the poison you want but as an individual you’ll never manage to poison the whole thing without viral measures