ChatGPT gets code questions wrong 52% of the time

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 1 year ago

ChatGPT gets code questions wrong 52% of the time

SirGolan@lemmy.sdf.org · edit-2 1 year ago

Wait a second here… I skimmed the paper and GitHub and didn’t find an answer to a very important question: is this GPT3.5 or 4? There’s a huge difference in code quality between the two and either they made a giant accidental omission or they are being intentionally misleading. Please correct me if I missed where they specified that. I’m assuming they were using GPT3.5, so yeah those results would be as expected. On the HumanEval benchmark, GPT4 gets 67% and that goes up to 90% with reflexion prompting. GPT3.5 gets 48.1%, which is exactly what this paper is saying. (source).

Corkyskog@sh.itjust.works · 1 year ago

Is GPT4 publicly available?

newIdentity@sh.itjust.works · 1 year ago

Yes… If you pay $20 a month

SirGolan@lemmy.sdf.org · 1 year ago

Yes available to anyone in the API or anyone who pays for ChatGPT subscription.

floofloof@lemmy.ca · 1 year ago

Whatever GitHub Copilot uses (the version with the chat feature), I don’t find its code answers to be particularly accurate. Do we know which version that product uses?

SirGolan@lemmy.sdf.org · 1 year ago

If we are talking Copilot then that’s not ChatGPT. But I agree it’s ok. Like it can do simple things well but I go to GPT 4 for the hard stuff. (Or my own brain haha)

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 1 year ago

Oh that’s possible, not sure which one they used either.