Am I wrong here? Like, look, shame me. I work in machine learning and have since 2012. I don’t do any of the llm shit. I do things like predicting wildfire risk from satellite imagery or biomass in the amazon, soil carbon, shit like that.
I’ve tried all the code assistants. They’re fucking crap. There’s no building an economy around these things. You’ll just get dogshit. There’s no building institutions around these things.
If you want a demo on how bad these AI coding agents are, build a medium-sized script with one, something with a parse -> process -> output flow that isn’t trivial. Let it do the debug, too (like tell it the error message or the unwanted behaviour).
You’ll probably get the desired output if you’re using one of the good models.
Now ask it to review the code or optimize it.
If it was a good coding AI, this step shouldn’t involve much, as it would have been applying the same reasoning during the code writing process.
But in my experience, this isn’t what happens. For a review, it has a lot of notes. It can also find and implement optimizations. The weighs are the same, the only difference is that the context of the prompt has changed from “write code” to “optimize code”, which affects the correlations involved. There is no “write optimal code” because it’s trained on everything and the kitchen sink, so you’ll get correlations from good code, newbie coders, lesson examples of bad ways to do things (especially if it’s presented in a “discovery” format where a prof intended to talk about why this slide is bad but didn’t include that on the slide itself).
It’s funny. I see the phrase “AI doomsday scenario” and I immediately picture devastating cascading consequences caused by someone mistakenly putting too much trust in some kind of agentic AI that does things poorly and breaks a lot of big important things.
I’m just not seeing a scenario where AI causes devastating disruption based on its own ultra competence. I’m much more scared of AI incompetence.
Well for one, that area already burned pretty recently. So its pretty unlikely to burn again any time soon.
But as part of a larger picture:
The area does experience fire-weather conditions for some portion of the year:
Here we’re looking at HDWI (hot dry windy index), where a “loose” definition of fire weather is if HDWI is above 200. HDWI is based on a few factors, namely, how hot it is, how dry it is, and how fast the air is moving. Hot dry air moving quickly = fire weather.
The number of fire weather days per year has been increasing, and in very recent years (the past decade) the rate of change has increased, and become statistically signficant:
So its not a particularly fire prone area, but its getting worse, and its getting worse at a faster rate.
That would be the first part of the analysis I would run. After that, we’d look for historically “anomalous” periods. Its not enough to look at averages; that will wash over important features in the data. We need to look for specific periods where fire weather manifests.
This is another way of thinking about fire risk. Here we’re going to count the amount of time, after 12 hours, that an area is in sustained fire-weather conditions. Basically, a bit of time in bad conditions isn’t the end of the world, but as you stay in fire weather conditions, fire risk increases exponentially (as plants/ fuels continue to dry out).
If I were writing an insurance product for you, I would count the number of events in a given magnitude bucket and give you a risk rating. Here, licking my thumb and sticking it in the air, I would say… “not that bad”.
Much of my work is around modeling in the wilderness urban interface. You picked an almost all wilderness area. Since there are no structures, I cant do the next analysis, but it would looks something like this:
Most of my work is about figuring out what the impacts of wildfire on the built environment are going to be. Also, the free structure dataset I have access to doesn’t cover Canada and I’m not going to spend money buying the structures for you (unless you REALLY want me to).
Those first figures are all specific to the coordinates you provided. The final figure is just an example.
You’ve worked in ML since 2012 but dont think transformers have had an absolutely insane impact, for example in NLP and machine translation? (I have worked in those fields longer than that and while I dont think AGI or anything like that is coming from transformers and deep neural nets I think you are full of it if you dont admit they have revolutionized a large number of [highly technical] fields).
I’m literally submitting a transformers paper for publication this week. They’re truly incredible. They’re a huge step forwards from where we we at. But so was YOLO, and UNET, and lstm’s (kinda, they were a bit meh).
But there is a secondary claim about llms, chat bot/agentic llm specifically, that they’re doing things they simply arent. And I do pay for higher tier access so I at least think I’m using some of the state of the art of these things.
I think you are full of it if you dont admit they have revolutionized a large number of highly technical fields
I’m specifically saying they haven’t, at least, that if you are using Claude or chatgpt to do those things, you aren’t doing what you think you are doing. Domain experts who use these tools recognize their limitations, and limitation is a soft way of putting it. They just get shit fundamentally wrong. And sometimes, when you are working on a complex problem, if you don’t have the knowledge or experience to know when something is wrong, you’ll believe these machines are doing far more than they are.
Look I use them regularly. I can support up to 128gb models locally. I understand the claim that these things have utility. But after several years of working with them, I genuinely don’t think they actually are capable of supporting the claims businesses are making about them.
For one, while they can help you solve some problems faster, often, they just make the situation far, far worse, and you spend an inordinate amount of time trying to get the thing to do something a specific way, but it just won’t. I think this is related to the half glass of wine issue, which I’ll come back to.
Second, they, as far as I’ve been able to use them, are utter dogshit at returning to a codebase. If you are trying to get them to have some kind of long term comprehension of what’s happening in a project, good fucking luck. You end up with a codebase of constant refactors and stupid useless “sanity” checks that creates the appearance of good practices, but is all smoke and mirrors. They seem to work ok for single shot demos, but you could never run a business or build a program that’s worth keeping around where the llm is central to managing the process. And there is more to say in this because when you are building up a codebase, the most fundamental thing you are really building up is a vision of how it all fits together. When you outsource this to LLMs, you don’t get the vision, and frankly they don’t either. What you end up with is maybe functional at first, but inevitably unstable, and unsustainable.
Third, and maybe this is me, but I’ve never actually seen an llm come up with a clever solution to anything. Like not once have I seen it come up with a truly elegant, efficient solution. It’s almost always the most banal, solution, and more often then not, it’s not even a solution, but a work around that avoids the problem entirely while creating the impression of a solution.
And to be clear. I’m not talking about mundaun hello world statements. I’m talking about things that undergrads and graduate students miss all the time. I’m talking about gotchas and problems that you need somethings decades plus to know that the fundamental assumptions are flawed. There is something more inherent to the issues they create.
I think the half glass of wine issue has been papered over and remains the core limitation with LLMs, and represents a fundamental issue with either transformers, or maybe gradient decent, and I don’t think this current architecture is going to get us past it. You are probably familiar with the issue, it got traction a while back, but the hot fixed the phenomena and it lost media attention. However, if you know what you are looking for, you’ll find non image based examples of this all the time when using LLMs. They’ll constantly insist they’ve done something they haven’t. And there will be no obvious way to get them to recognize they haven’t done or aren’t doing the thing. I don’t believe any of the philosophy explanations given in the YouTube coverage of the issue. I think the problem is likely more core, more central to machine learning that credit is being given.
The concern I have is that this is something more fundamental, and were only noticing it because image generation and natural language are something humans can comprehend and notice the issue in. But what about when it becomes something incomprehensible to humans, like a sequence of weather data or output from a sensor. We would have no ability to notice if ml model is doing the same thing that an llm is doing, effectively lying about it’s about.
Long rant over shortly.
Tldr
I think don’t contend the massive advances transformers represent as an architecture. But there is clearly something rotten or missing at their core which makes them practically self destructive to rely upon beyond superficial, well solved issues. I think the rot is in the math or the approach to training and I don’t think there is any amount of engineering that can unbake the sawdust out of the cake.
Thank you for putting in the time to try to explain this. I’ve also been working with ML for a long time and have the intuition that there is a fundamental limit to what can be achieved with Neural Networks and probably any other Machine Learning technique.
Your cake analogy is probably better, but let me try another one. It’s like having a machine that is trying to solve a gigantic puzzle with random pieces from different puzzles. Yeah, if you have enough puzzle pieces and the machine is fast enough it can give you suboptimal solutions but the puzzle will not ever be solved as it is supposed to be solved.
I think you read way more into their comment than was written. They said nothing about transformers, only that these assistants are shit. Which, let’s admit, they are.
The underlying technology is cool, current implementations are trash and have no long term economical path to viability unless things radically change quickly.
I think it’s supposed to work like, “well, even if you are right about the massive utility of AI, is that still what we should be aiming for?”
It gets around the combative “you’re wrong, AI is garbage” argument. The people hoisting AI because they believe, even if it does suck, it’ll get better… those people can probably understand this argument much more easily.
I just…
Am I wrong here? Like, look, shame me. I work in machine learning and have since 2012. I don’t do any of the llm shit. I do things like predicting wildfire risk from satellite imagery or biomass in the amazon, soil carbon, shit like that.
I’ve tried all the code assistants. They’re fucking crap. There’s no building an economy around these things. You’ll just get dogshit. There’s no building institutions around these things.
Can I subscribe to your AI posts?
If you want a demo on how bad these AI coding agents are, build a medium-sized script with one, something with a parse -> process -> output flow that isn’t trivial. Let it do the debug, too (like tell it the error message or the unwanted behaviour).
You’ll probably get the desired output if you’re using one of the good models.
Now ask it to review the code or optimize it.
If it was a good coding AI, this step shouldn’t involve much, as it would have been applying the same reasoning during the code writing process.
But in my experience, this isn’t what happens. For a review, it has a lot of notes. It can also find and implement optimizations. The weighs are the same, the only difference is that the context of the prompt has changed from “write code” to “optimize code”, which affects the correlations involved. There is no “write optimal code” because it’s trained on everything and the kitchen sink, so you’ll get correlations from good code, newbie coders, lesson examples of bad ways to do things (especially if it’s presented in a “discovery” format where a prof intended to talk about why this slide is bad but didn’t include that on the slide itself).
It’s funny. I see the phrase “AI doomsday scenario” and I immediately picture devastating cascading consequences caused by someone mistakenly putting too much trust in some kind of agentic AI that does things poorly and breaks a lot of big important things.
I’m just not seeing a scenario where AI causes devastating disruption based on its own ultra competence. I’m much more scared of AI incompetence.
Your job sounds really cool! How likely is Alberta to be on fire again this year?
Gimme some coordinates.
57.4228475, -113.8340952
Well for one, that area already burned pretty recently. So its pretty unlikely to burn again any time soon.
But as part of a larger picture:
The area does experience fire-weather conditions for some portion of the year:
Here we’re looking at HDWI (hot dry windy index), where a “loose” definition of fire weather is if HDWI is above 200. HDWI is based on a few factors, namely, how hot it is, how dry it is, and how fast the air is moving. Hot dry air moving quickly = fire weather.
The number of fire weather days per year has been increasing, and in very recent years (the past decade) the rate of change has increased, and become statistically signficant:
So its not a particularly fire prone area, but its getting worse, and its getting worse at a faster rate.
That would be the first part of the analysis I would run. After that, we’d look for historically “anomalous” periods. Its not enough to look at averages; that will wash over important features in the data. We need to look for specific periods where fire weather manifests.
This is another way of thinking about fire risk. Here we’re going to count the amount of time, after 12 hours, that an area is in sustained fire-weather conditions. Basically, a bit of time in bad conditions isn’t the end of the world, but as you stay in fire weather conditions, fire risk increases exponentially (as plants/ fuels continue to dry out).
If I were writing an insurance product for you, I would count the number of events in a given magnitude bucket and give you a risk rating. Here, licking my thumb and sticking it in the air, I would say… “not that bad”.
Much of my work is around modeling in the wilderness urban interface. You picked an almost all wilderness area. Since there are no structures, I cant do the next analysis, but it would looks something like this:
Most of my work is about figuring out what the impacts of wildfire on the built environment are going to be. Also, the free structure dataset I have access to doesn’t cover Canada and I’m not going to spend money buying the structures for you (unless you REALLY want me to).
Those first figures are all specific to the coordinates you provided. The final figure is just an example.
Heh, that’s the joke going around now.
AI works, it replaces workers, we lose our jobs.
AI doesn’t work, bubble pops, we lose our jobs.
You are right in every serious part of the world.
But add “venture capital” to the equation and it works out stronger than anything else so far.
You’ve worked in ML since 2012 but dont think transformers have had an absolutely insane impact, for example in NLP and machine translation? (I have worked in those fields longer than that and while I dont think AGI or anything like that is coming from transformers and deep neural nets I think you are full of it if you dont admit they have revolutionized a large number of [highly technical] fields).
Tldr at the bottom
I’m literally submitting a transformers paper for publication this week. They’re truly incredible. They’re a huge step forwards from where we we at. But so was YOLO, and UNET, and lstm’s (kinda, they were a bit meh).
But there is a secondary claim about llms, chat bot/agentic llm specifically, that they’re doing things they simply arent. And I do pay for higher tier access so I at least think I’m using some of the state of the art of these things.
I’m specifically saying they haven’t, at least, that if you are using Claude or chatgpt to do those things, you aren’t doing what you think you are doing. Domain experts who use these tools recognize their limitations, and limitation is a soft way of putting it. They just get shit fundamentally wrong. And sometimes, when you are working on a complex problem, if you don’t have the knowledge or experience to know when something is wrong, you’ll believe these machines are doing far more than they are.
Look I use them regularly. I can support up to 128gb models locally. I understand the claim that these things have utility. But after several years of working with them, I genuinely don’t think they actually are capable of supporting the claims businesses are making about them.
For one, while they can help you solve some problems faster, often, they just make the situation far, far worse, and you spend an inordinate amount of time trying to get the thing to do something a specific way, but it just won’t. I think this is related to the half glass of wine issue, which I’ll come back to.
Second, they, as far as I’ve been able to use them, are utter dogshit at returning to a codebase. If you are trying to get them to have some kind of long term comprehension of what’s happening in a project, good fucking luck. You end up with a codebase of constant refactors and stupid useless “sanity” checks that creates the appearance of good practices, but is all smoke and mirrors. They seem to work ok for single shot demos, but you could never run a business or build a program that’s worth keeping around where the llm is central to managing the process. And there is more to say in this because when you are building up a codebase, the most fundamental thing you are really building up is a vision of how it all fits together. When you outsource this to LLMs, you don’t get the vision, and frankly they don’t either. What you end up with is maybe functional at first, but inevitably unstable, and unsustainable.
Third, and maybe this is me, but I’ve never actually seen an llm come up with a clever solution to anything. Like not once have I seen it come up with a truly elegant, efficient solution. It’s almost always the most banal, solution, and more often then not, it’s not even a solution, but a work around that avoids the problem entirely while creating the impression of a solution.
And to be clear. I’m not talking about mundaun hello world statements. I’m talking about things that undergrads and graduate students miss all the time. I’m talking about gotchas and problems that you need somethings decades plus to know that the fundamental assumptions are flawed. There is something more inherent to the issues they create.
I think the half glass of wine issue has been papered over and remains the core limitation with LLMs, and represents a fundamental issue with either transformers, or maybe gradient decent, and I don’t think this current architecture is going to get us past it. You are probably familiar with the issue, it got traction a while back, but the hot fixed the phenomena and it lost media attention. However, if you know what you are looking for, you’ll find non image based examples of this all the time when using LLMs. They’ll constantly insist they’ve done something they haven’t. And there will be no obvious way to get them to recognize they haven’t done or aren’t doing the thing. I don’t believe any of the philosophy explanations given in the YouTube coverage of the issue. I think the problem is likely more core, more central to machine learning that credit is being given.
The concern I have is that this is something more fundamental, and were only noticing it because image generation and natural language are something humans can comprehend and notice the issue in. But what about when it becomes something incomprehensible to humans, like a sequence of weather data or output from a sensor. We would have no ability to notice if ml model is doing the same thing that an llm is doing, effectively lying about it’s about.
Long rant over shortly.
Tldr
I think don’t contend the massive advances transformers represent as an architecture. But there is clearly something rotten or missing at their core which makes them practically self destructive to rely upon beyond superficial, well solved issues. I think the rot is in the math or the approach to training and I don’t think there is any amount of engineering that can unbake the sawdust out of the cake.
Thank you for putting in the time to try to explain this. I’ve also been working with ML for a long time and have the intuition that there is a fundamental limit to what can be achieved with Neural Networks and probably any other Machine Learning technique.
Your cake analogy is probably better, but let me try another one. It’s like having a machine that is trying to solve a gigantic puzzle with random pieces from different puzzles. Yeah, if you have enough puzzle pieces and the machine is fast enough it can give you suboptimal solutions but the puzzle will not ever be solved as it is supposed to be solved.
I think you read way more into their comment than was written. They said nothing about transformers, only that these assistants are shit. Which, let’s admit, they are.
The underlying technology is cool, current implementations are trash and have no long term economical path to viability unless things radically change quickly.
I think it’s supposed to work like, “well, even if you are right about the massive utility of AI, is that still what we should be aiming for?”
It gets around the combative “you’re wrong, AI is garbage” argument. The people hoisting AI because they believe, even if it does suck, it’ll get better… those people can probably understand this argument much more easily.
It sucks and its at the point now where were hitting diminishing returns so I’m not sire if it sill get better