Can we trust LLM CALCULATIONS?.

Farmdude@lemmy.world · 2 天前

Can we trust LLM CALCULATIONS?.

Rhaedas@fedia.io · 2 天前

How trustable the answer is depends on knowing where the answers come from, which is unknowable. If the probability of the answers being generated from the original problem are high because it occurred in many different places in the training data, then maybe it’s correct. Or maybe everyone who came up with the answer is wrong in the same way and that’s why there is so much correlation. Or perhaps the probability match is simply because lots of math problems tend towards similar answers.

The core issue is that the LLM is not thinking or reasoning about the problem itself, so trusting it with anything is more assuming the likelihood of it being right more than wrong is high. In some areas this is safe to do, in others it’s a terrible assumption to make.

Farmdude@lemmy.world · 2 天前

I’m a little confused after listening to a podcast with… Damn I can’t remember his name. He’s English. They call him the godfather of AI. A pioneer.

Well, he believes that gpt 2-4 were major breakthroughs in artificial infection. He specifically said chat gpt is intelligent. That some type of reasoning is taking place. The end of humanity could come in a year to 50 years away. If the fella who imagined a Neural net that is mapped using the human brain. And this man says it is doing much more. Who should I listen too?. He didn’t say hidden AI. HE SAID CHAT GPT. HONESTLY ON OFFENSE. I JUST DON’T UNDERSTAND THIS EPIC SCENARIO ON ONE SIDE AND TOTALLY NOTHING ON THE OTHER

groet@feddit.org · 2 天前

Anyone with a stake in the development of AI is lying to you about how good models are and how soon they will be able to do X.

They have to be lying because the truth is that LLMs are terrible. They can’t reason at all. When they perform well on benchmarks its because every benchmark contains questions that are in the LLMs training data. If you burn trillions of dollars and have nothing to show, you lie so people keep giving you money.

https://arxiv.org/html/2502.14318

However, the extent of this progress is frequently exaggerated based on appeals to rapid increases in performance on various benchmarks. I have argued that these benchmarks are of limited value for measuring LLM progress because of problems of models being over-fit to the benchmarks, lack real-world relevance of test items, and inadequate validation for whether the benchmarks predict general cognitive performance. Conversely, evidence from adversarial tasks and interpretability research indicates that LLMs consistently fail to learn the underlying structure of the tasks they are trained on, instead relying on complex statistical associations and heuristics which enable good performance on test benchmarks but generalise poorly to many real-world tasks.

Rhaedas@fedia.io · 2 天前

One step might be to try and understand the basic principles behind what makes a LLM function. The Youtube channel 3blue1brown has at least one good video on transformers and how they work, and perhaps that will help you understand that “reasoning” is a very broad term that doesn’t necessarily mean thinking. What is going on inside a LLM is fascinating and amazing in what does manage to come out that’s useful, but like any tool it can’t be used for everything well, if at all.

Farmdude@lemmy.world · 2 天前

I’ll ask AI what’s really going on lolool.

Rhaedas@fedia.io · 2 天前

Funny, but also not a bad idea, as you can ask it to clarify on things as you go. I just reference that YT channel because he has a great ability to visually show things to help them make sense.