I’m personally not and I never will be, but I keep seeing on the news that a lot of people are actually becoming friends with their AI bots, trying to use them as substitutes to replace real human interaction. Kinda scary, kinda absurd. What is your take on this? And are you friends with a bot?

  • brucethemoose@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    edit-2
    4 hours ago

    I’m in a 24GB 3090 + 128GB RAM.

    With full 300B GLM 4.6, I typically run 12K-28K context with different settings. I could do more than 28K, but the higher quantization starts to become a problem (as 128GB is right on the edge of fitting an IQ3_KT). And I get 5-6 tokens/s text-generation doing that.

    With GLM Air? I can get a lot more, closer to 64K.

    With smaller models that’s no issue.

    I only get 3-5 questions in before I run out of tokens.

    IDK how you’re prompting it, but you should clear the thinking block after every question, and that should leave plenty of tokens.

    What model are you running, and what are your inference server settings?