• v_krishna@lemmy.ml
    link
    fedilink
    English
    arrow-up
    17
    arrow-down
    9
    ·
    15 hours ago

    You’ve worked in ML since 2012 but dont think transformers have had an absolutely insane impact, for example in NLP and machine translation? (I have worked in those fields longer than that and while I dont think AGI or anything like that is coming from transformers and deep neural nets I think you are full of it if you dont admit they have revolutionized a large number of [highly technical] fields).

    • TropicalDingdong@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      ·
      4 hours ago

      Tldr at the bottom

      I’m literally submitting a transformers paper for publication this week. They’re truly incredible. They’re a huge step forwards from where we we at. But so was YOLO, and UNET, and lstm’s (kinda, they were a bit meh).

      But there is a secondary claim about llms, chat bot/agentic llm specifically, that they’re doing things they simply arent. And I do pay for higher tier access so I at least think I’m using some of the state of the art of these things.

      I think you are full of it if you dont admit they have revolutionized a large number of highly technical fields

      I’m specifically saying they haven’t, at least, that if you are using Claude or chatgpt to do those things, you aren’t doing what you think you are doing. Domain experts who use these tools recognize their limitations, and limitation is a soft way of putting it. They just get shit fundamentally wrong. And sometimes, when you are working on a complex problem, if you don’t have the knowledge or experience to know when something is wrong, you’ll believe these machines are doing far more than they are.

      Look I use them regularly. I can support up to 128gb models locally. I understand the claim that these things have utility. But after several years of working with them, I genuinely don’t think they actually are capable of supporting the claims businesses are making about them.

      For one, while they can help you solve some problems faster, often, they just make the situation far, far worse, and you spend an inordinate amount of time trying to get the thing to do something a specific way, but it just won’t. I think this is related to the half glass of wine issue, which I’ll come back to.

      Second, they, as far as I’ve been able to use them, are utter dogshit at returning to a codebase. If you are trying to get them to have some kind of long term comprehension of what’s happening in a project, good fucking luck. You end up with a codebase of constant refactors and stupid useless “sanity” checks that creates the appearance of good practices, but is all smoke and mirrors. They seem to work ok for single shot demos, but you could never run a business or build a program that’s worth keeping around where the llm is central to managing the process. And there is more to say in this because when you are building up a codebase, the most fundamental thing you are really building up is a vision of how it all fits together. When you outsource this to LLMs, you don’t get the vision, and frankly they don’t either. What you end up with is maybe functional at first, but inevitably unstable, and unsustainable.

      Third, and maybe this is me, but I’ve never actually seen an llm come up with a clever solution to anything. Like not once have I seen it come up with a truly elegant, efficient solution. It’s almost always the most banal, solution, and more often then not, it’s not even a solution, but a work around that avoids the problem entirely while creating the impression of a solution.

      And to be clear. I’m not talking about mundaun hello world statements. I’m talking about things that undergrads and graduate students miss all the time. I’m talking about gotchas and problems that you need somethings decades plus to know that the fundamental assumptions are flawed. There is something more inherent to the issues they create.

      I think the half glass of wine issue has been papered over and remains the core limitation with LLMs, and represents a fundamental issue with either transformers, or maybe gradient decent, and I don’t think this current architecture is going to get us past it. You are probably familiar with the issue, it got traction a while back, but the hot fixed the phenomena and it lost media attention. However, if you know what you are looking for, you’ll find non image based examples of this all the time when using LLMs. They’ll constantly insist they’ve done something they haven’t. And there will be no obvious way to get them to recognize they haven’t done or aren’t doing the thing. I don’t believe any of the philosophy explanations given in the YouTube coverage of the issue. I think the problem is likely more core, more central to machine learning that credit is being given.

      The concern I have is that this is something more fundamental, and were only noticing it because image generation and natural language are something humans can comprehend and notice the issue in. But what about when it becomes something incomprehensible to humans, like a sequence of weather data or output from a sensor. We would have no ability to notice if ml model is doing the same thing that an llm is doing, effectively lying about it’s about.

      Long rant over shortly.

      Tldr

      I think don’t contend the massive advances transformers represent as an architecture. But there is clearly something rotten or missing at their core which makes them practically self destructive to rely upon beyond superficial, well solved issues. I think the rot is in the math or the approach to training and I don’t think there is any amount of engineering that can unbake the sawdust out of the cake.

      • juanito_the_great@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 hours ago

        Thank you for putting in the time to try to explain this. I’ve also been working with ML for a long time and have the intuition that there is a fundamental limit to what can be achieved with Neural Networks and probably any other Machine Learning technique.

        Your cake analogy is probably better, but let me try another one. It’s like having a machine that is trying to solve a gigantic puzzle with random pieces from different puzzles. Yeah, if you have enough puzzle pieces and the machine is fast enough it can give you suboptimal solutions but the puzzle will not ever be solved as it is supposed to be solved.

    • Passerby6497@lemmy.world
      link
      fedilink
      English
      arrow-up
      11
      ·
      7 hours ago

      I think you read way more into their comment than was written. They said nothing about transformers, only that these assistants are shit. Which, let’s admit, they are.

      The underlying technology is cool, current implementations are trash and have no long term economical path to viability unless things radically change quickly.