• khepri@lemmy.world
    link
    fedilink
    English
    arrow-up
    7
    arrow-down
    3
    ·
    9 hours ago

    It’s why I trust my random unauditable chinese matrix soup over my random unauditable american matrix soup frankly

      • brucethemoose@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        2 hours ago

        Most aren’t really running Deepseek locally. What ollama advertises (and basically lies about) is the now-obselete Qwen 2.5 distillations.

        …I mean, some are, but it’s exclusively lunatics with EPYC homelab servers, heh. And they are not using ollama.

        • DandomRude@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          47 minutes ago

          Thx for clarifying.

          I once tried a community version from huggingface (distilled), which worked quite well even on modest hardware. But that was a while ago. Unfortunately, I haven’t had much time to look into this stuff lately, but I wanted to check that again at some point.

      • khepri@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        8 hours ago

        naw, I mean more that the kind of people who uncritically would take everything a chatbot says a face value are probably better off being in chatGPTs little curated garden anyway. Cause people like that are going to immediately get grifted into whatever comes along first no matter what, and a lot of those are a lot more dangerous to the rest of us that a bot that won’t talk great replacement with you.

        • DandomRude@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          8 hours ago

          Ahh, thank you—I had misunderstood that, since Deepseek is (more or less) an open-source LLM from China that can also be used and fine-tuned on your own device using your own hardware.

          • ranzispa@mander.xyz
            link
            fedilink
            English
            arrow-up
            2
            arrow-down
            1
            ·
            7 hours ago

            Do you have a cluster with 10 A100 lying around? Because that’s what it gets to run deepseek. It is open source, but it is far from accessible to run on your own hardware.

            • DandomRude@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              6 hours ago

              Yes, that’s true. It is resource-intensive, but unlike other capable LLMs, it is somewhat possible—not for most private individuals due to the requirements, but for companies with the necessary budget.

              • FauxLiving@lemmy.world
                link
                fedilink
                English
                arrow-up
                5
                ·
                6 hours ago

                They’re overestimating the costs. 4x H100 and 512GB DDR4 will run the full DeepSeek-R1 model, that’s about $100k of GPU and $7k of RAM. It’s not something you’re going to have in your homelab (for a few years at least) but it’s well within the budget of a hobbyist group or moderately sized local business.

                Since it’s an open weights model, people have created quantized versions of the model. The resulting models can have much less parameters and that makes their RAM requirements a lot lower.

                You can run quantized versions of DeepSeek-R1 locally. I’m running deepseek-r1-0528-qwen3-8b on a machine with an NVIDIA 3080 12GB and 64GB RAM. Unless you pay for an AI service and are using their flagship models, it’s pretty indistinguishable from the full model.

                If you’re coding or doing other tasks that push AI it’ll stumble more often, but for a ‘ChatGPT’ style interaction you couldn’t tell the difference between it and ChatGPT.

                • brucethemoose@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  edit-2
                  2 hours ago

                  You should be running hybrid inference of GLM Air with a setup like that. Qwen 8B is kinda obsolete.

                  I dunno what kind of speeds you absolutely need, but I bet you could get at least 12 tokens/s.