Ok, you have a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?

  • groet@feddit.org
    link
    fedilink
    arrow-up
    5
    ·
    2 days ago

    Anyone with a stake in the development of AI is lying to you about how good models are and how soon they will be able to do X.

    They have to be lying because the truth is that LLMs are terrible. They can’t reason at all. When they perform well on benchmarks its because every benchmark contains questions that are in the LLMs training data. If you burn trillions of dollars and have nothing to show, you lie so people keep giving you money.

    https://arxiv.org/html/2502.14318

    However, the extent of this progress is frequently exaggerated based on appeals to rapid increases in performance on various benchmarks. I have argued that these benchmarks are of limited value for measuring LLM progress because of problems of models being over-fit to the benchmarks, lack real-world relevance of test items, and inadequate validation for whether the benchmarks predict general cognitive performance. Conversely, evidence from adversarial tasks and interpretability research indicates that LLMs consistently fail to learn the underlying structure of the tasks they are trained on, instead relying on complex statistical associations and heuristics which enable good performance on test benchmarks but generalise poorly to many real-world tasks.