Why do all text LLMs, no matter how censored they are or what company made them, all have the same quirks and use the slop names and expressions?

HammyMcBurgers@sh.itjust.works · edit-2 2 days ago

Why do all text LLMs, no matter how censored they are or what company made them, all have the same quirks and use the slop names and expressions?

litchralee@sh.itjust.works · 2 days ago

So no, no billion dollar company can make their own training data

This statement brought along with it the terrifying thought that there’s a dystopian alternative timeline where companies do make their own training data, by commissioning untold numbers of scientists, engineers, artists, researchers, and other specialties to undertake work that no one else has. But rather than trying to further the sum of human knowledge, or even directly commercializing the fruits of that research, that it’s all just fodder to throw into the LLM training set. A world where knowledge is not only gatekept like Elsevier but it isn’t even accessible by humans: only the LLM will get to read it and digest it for human consumption.

Written by humans, read by AI, spoonfed to humans. My god, what an awful world that would be.

witten@lemmy.world · edit-2 2 days ago

We’re already living in it. Professional voice actors now have the choice between vying for the dwindling number of voice acting gigs or selling their voice (via commissioned recordings) to LLM companies as training data.