That generally only happens in cases of overfitting, where the model was trained on a poorly de-duplicated data set that contains many copies of that book (or excerpts, quotes, and so forth). This is considered a flaw by AI trainers and a lot of work goes into sanitizing the training data to prevent it.
You went digging through my Reddit comments to find a two-month-old thread, that must have taken a lot of effort. But I’m afraid I don’t see what the relevance of it is, aside from a general “it’s about AI”. The bulk of the comments I wrote there were about water usage.
I’m genuinely puzzled. Are you saying that deduplicating data is “hiding unethical behaviour?” It’s actually intended for improving the model’s performance, having a model spit out exact copies of its training data means you’ve produced a hugely expensive and wasteful re-implementation of copy-and-paste rather than a generative AI. The whole point of generative AI is to produce novel outputs.
That generally only happens in cases of overfitting, where the model was trained on a poorly de-duplicated data set that contains many copies of that book (or excerpts, quotes, and so forth). This is considered a flaw by AI trainers and a lot of work goes into sanitizing the training data to prevent it.
But you’re otherwise disgusted by the fact that material is plagiarized without consent to begin with…
…Right, FaceDeer?
You went digging through my Reddit comments to find a two-month-old thread, that must have taken a lot of effort. But I’m afraid I don’t see what the relevance of it is, aside from a general “it’s about AI”. The bulk of the comments I wrote there were about water usage.
I’m genuinely puzzled. Are you saying that deduplicating data is “hiding unethical behaviour?” It’s actually intended for improving the model’s performance, having a model spit out exact copies of its training data means you’ve produced a hugely expensive and wasteful re-implementation of copy-and-paste rather than a generative AI. The whole point of generative AI is to produce novel outputs.