• ramble81@lemmy.zip
    link
    fedilink
    English
    arrow-up
    12
    ·
    2 days ago

    Serious question here. LLMs trained their data off SO. Developers now ask LLMs for solutions instead of SO. New technology comes out that LLMs don’t have indexed. Where will LLMs get their data to train on for new technologies? You can’t exactly feed it a manual and expect it to extrapolate or understand (for that matter “what manual).

    • dantheclamman@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 day ago

      I am worried, because there are increasing cases where open source docs are going offline because they can’t take the bandwidth costs of the big LLM bots recrawling hundreds of times per day. Wikipedia is also getting hammered. There is so much waste and diminishing returns

    • Prox@lemmy.world
      link
      fedilink
      English
      arrow-up
      8
      ·
      edit-2
      2 days ago

      Yes, that is the major problem with LLMs in general. There is no solution aside from “train on another different source (like Reddit)”, but then we rinse & repeat.

        • Prox@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          2 days ago

          I guess, though I’m pretty ignorant as to how RLVR would fix the issue that arises from new coding languages or even new major versions. I’m not sure how LLMs would ever get to a correct answer if they don’t have good reference material to start from or reference.

          • General_Effort@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            2 days ago

            The assumption seems to be that an LLM can’t figure out a manual or source code. If it can’t, then you have to pay people. But that’s not a universally valid assumption.

    • General_Effort@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      2 days ago

      You can’t exactly feed it a manual and expect it to extrapolate or understand (for that matter “what manual).

      You can do that to a degree (RLVR). They are also paying human experts. But that’s the situation now. Who knows how it will be in a couple more years. Maybe training AIs will be like writing a library, framework, …