Is there a currently an accurate way to say how much power per prompt LLMs use?

SnausagesinaBlanket@lemmy.world · 9 hours ago

Is there a currently an accurate way to say how much power per prompt LLMs use?

empireOfLove2@lemmy.dbzer0.com · 9 hours ago

There’s no way that I know of to see the per prompt usage for commercially available models. They obviously hide that. I admit I don’t research them much but I am assuming each chip is processing prompts one at a time.

Its pretty simple arithmetic - if it’s running exclusively on a single GPU system, and a prompt takes X seconds to generate on said gpu, then you take the GPUs power over X seconds plus whatever fraction of the datacenter overhead power that gpu uses. For locally run models on your own hardware this is also trivial to calculate.

Alternatively, GPU’s run at a certain number of “tokens” per second and each prompt is a certain number of tokens being fed into the model, generally scaling with the length of prompt.