I assume they all crib from the same training sets, but surely one of the billion dollar companies behind them can make their own?

  • litchralee@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    17
    ·
    2 days ago

    So no, no billion dollar company can make their own training data

    This statement brought along with it the terrifying thought that there’s a dystopian alternative timeline where companies do make their own training data, by commissioning untold numbers of scientists, engineers, artists, researchers, and other specialties to undertake work that no one else has. But rather than trying to further the sum of human knowledge, or even directly commercializing the fruits of that research, that it’s all just fodder to throw into the LLM training set. A world where knowledge is not only gatekept like Elsevier but it isn’t even accessible by humans: only the LLM will get to read it and digest it for human consumption.

    Written by humans, read by AI, spoonfed to humans. My god, what an awful world that would be.

    • witten@lemmy.world
      link
      fedilink
      arrow-up
      3
      ·
      edit-2
      2 days ago

      We’re already living in it. Professional voice actors now have the choice between vying for the dwindling number of voice acting gigs or selling their voice (via commissioned recordings) to LLM companies as training data.