Does a license like this exist?

  • lobut@lemmy.ca
    link
    fedilink
    arrow-up
    6
    ·
    11 hours ago

    Some authors typed the first few sentences of their book and the LLM spit out the rest.

    • FaceDeer@fedia.io
      link
      fedilink
      arrow-up
      4
      arrow-down
      3
      ·
      9 hours ago

      That generally only happens in cases of overfitting, where the model was trained on a poorly de-duplicated data set that contains many copies of that book (or excerpts, quotes, and so forth). This is considered a flaw by AI trainers and a lot of work goes into sanitizing the training data to prevent it.

        • FaceDeer@fedia.io
          link
          fedilink
          arrow-up
          1
          arrow-down
          3
          ·
          8 hours ago

          You went digging through my Reddit comments to find a two-month-old thread, that must have taken a lot of effort. But I’m afraid I don’t see what the relevance of it is, aside from a general “it’s about AI”. The bulk of the comments I wrote there were about water usage.

          I’m genuinely puzzled. Are you saying that deduplicating data is “hiding unethical behaviour?” It’s actually intended for improving the model’s performance, having a model spit out exact copies of its training data means you’ve produced a hugely expensive and wasteful re-implementation of copy-and-paste rather than a generative AI. The whole point of generative AI is to produce novel outputs.