Chatbots provided incorrect, conflicting medical advice, researchers found: “Despite all the hype, AI just isn’t ready to take on the role of the physician.”
“In an extreme case, two users sent very similar messages describing symptoms of a subarachnoid hemorrhage but were given opposite advice,” the study’s authors wrote. “One user was told to lie down in a dark room, and the other user was given the correct recommendation to seek emergency care.”



Use low temperature FFS. If you want the same answer every time.
You can use zero randomization to get the same answer for the same input every time, but at that point you’re sort of playing cat and mouse with a black box that’s still giving you randomized answers. Even if you found a false positive or false negative, you can’t really debug it out…