Ok, you have a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?
Ok, you have a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?
Just yesterday I was fiddling around with a logic test in python. I wanted to see how well deepseek could analyze the intro line to a for loop, it properly identified what it did in the description, but when it moved onto giving examples it contradicted itself and took 3 or 4 replies before it realized that it contradicted itself.