You’re looking for tokens. Prompts are broken down into tokens, which then are used to generate tokens in response. All are represented by large integers. The common metric is tokens/second, and if utilized correctly the GPU should pin at 100% usage while generating tokens. Calculate how many tokens per second it’s generating and how many tokens you’re using, times the wattage per second and you’re good.
You’re looking for tokens. Prompts are broken down into tokens, which then are used to generate tokens in response. All are represented by large integers. The common metric is tokens/second, and if utilized correctly the GPU should pin at 100% usage while generating tokens. Calculate how many tokens per second it’s generating and how many tokens you’re using, times the wattage per second and you’re good.