Does a license like this exist?

  • FaceDeer@fedia.io
    link
    fedilink
    arrow-up
    4
    arrow-down
    3
    ·
    9 hours ago

    That generally only happens in cases of overfitting, where the model was trained on a poorly de-duplicated data set that contains many copies of that book (or excerpts, quotes, and so forth). This is considered a flaw by AI trainers and a lot of work goes into sanitizing the training data to prevent it.

      • FaceDeer@fedia.io
        link
        fedilink
        arrow-up
        1
        arrow-down
        3
        ·
        8 hours ago

        You went digging through my Reddit comments to find a two-month-old thread, that must have taken a lot of effort. But I’m afraid I don’t see what the relevance of it is, aside from a general “it’s about AI”. The bulk of the comments I wrote there were about water usage.

        I’m genuinely puzzled. Are you saying that deduplicating data is “hiding unethical behaviour?” It’s actually intended for improving the model’s performance, having a model spit out exact copies of its training data means you’ve produced a hugely expensive and wasteful re-implementation of copy-and-paste rather than a generative AI. The whole point of generative AI is to produce novel outputs.