Taalas HC1: 17,000 tokens/sec on Llama 3.1 8B vs Nvidia H200’s 233 tokens/sec. 73x faster at one-tenth the power. Each chip runs ONE model, hardwired into the transistors.

  • ImperialStout@beehaw.org
    link
    fedilink
    arrow-up
    4
    ·
    14 hours ago

    This sounds great to me. Anything that would increase supply of AI processing could lower demand on the GPU supply. I want to be able to upgrade my gaming computer again someday!

    • danhab99@programming.dev
      link
      fedilink
      arrow-up
      1
      ·
      2 hours ago

      AI really needs dedicated hardware, I feel like if there was more chip manufacturing in the west we might have more diverse chips.

      Frankly I’m really confused as to why this llm demand on ram isn’t encouraging new companies to manufacture ram. If this is a bubble then we all just wait it out, if it’s not a bubble then someone else would swoop in to take up the market.

    • Appoxo@lemmy.dbzer0.com
      link
      fedilink
      arrow-up
      3
      ·
      12 hours ago

      Every chip that is produced, takes away capacity that could have been used for consumer products.

      So yeah…not great.