Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

It will certainly decrease. Also, there are multiple ways to deal with hallucinations. You can sample GPT-4 not once, but 10, 100, 1000 times. The chances of it hallucinating the same things asymptotically reaches 0. It all depends on how much money you are willing to invest in getting the right opinion, which in the field of medicine, can be quite a lot.


sort by: page size:

> You can sample GPT-4 not once, but 10, 100, 1000 times.

Is there a study on improved outcomes based on simple repetitions of GPT-4? I would be very interested in that study. I don't think gpt hallucinations are like human hallucinations. Where if you ask someone after a temporary hallucination they might get it right another 9 times, but I could be wrong. That would be an interesting result.


I don’t have hard numbers but anecdotally hallucinating has gone down significantly with gpt4, it certainly still happens though.

Try paying for GPT-4 - it barely hallucinates at all, at least as far as I've noticed.

GPT-4 hallucinates a lot less than 3.5. Same with the Claude Models. This is from personal experience. There are also benchmarks (like TruthfulQA) that try to measure hallucinations that show the same thing.

The technical report[1] makes that claim at least:

>GPT-4 significantly reduces hallucinations relative to previous GPT-3.5 models (which have them- selves been improving with continued iteration). GPT-4 scores 19 percentage points higher than our latest GPT-3.5 on our internal, adversarially-designed factuality evaluations

[1] https://arxiv.org/abs/2303.08774 (text from page 10)


I get maybe one hallucination per twenty chats with gpt4.

Does GPT-4 really never hallucinate?

This is the first time I've personally heard someone claiming anything like that, though I don't tend to do anything with LLMs (due to this bullshit factor).


Do you think hallucinations will be solved with GPT-5? If so, that would be an amazing breakthrough. If not, it still won't be suitable for medical advice.

Idk how you're promoting, but hallucinations are rare for me, even at graduate-level material.

There's a reason GPT-4 scores so highly on advanced exams like the USMLE.


What is the hallucination rate of, for example, a Llama3 or GPT4?

I'm curious if you're using GPT-4 ($)? I find a lot of the criticisms about hallucination come from users who aren't, and my experience with GPT-4 is it's far less likely to make stuff up. Does it know all the answers, certainly not, but it's self-aware enough to say sorry I don't know instead of making a wild guess.

Haha, asking chat-gpt surely won't work. Everything can "feel" like a halting problem if you want perfect results with zero error with uncertain and ambiguous new data adding.

My take - Hallucinations can never be made to perfect zero but they can be reduced to a point where these systems in 99.99% will be hallucinating less than humans and more often than not their divergences will turn out to be creative thought experiments (which I term as healthy imagination). If it hallucinates less than a top human do - I say we win :)


True, "amount of hallucination" (very confident, but factually wrong) is probably something they can decrease in the next versions tho.

I also would not trust it with anything important, but there can be good applications for something that works 9/10 times.


My experience has been that the hallucinations in GPT4 are actually pretty rare. But in any case, if I choose to use code it suggests I ask it for explanations and then I verify those myself by other means, e.g. tests. I think it's too strong to call it a really bad TA. I'd say it's an imperfect TA and you need to check it's work, but it's work still has great value.

The issue of hallucinations is overblown. I use GPT4 all the time and don't see any hallucinations at all. It's a big problem with Google BARD and GPT3 and earlier models. But GPT4 fixed the issue of hallucinations completely.

I honestly haven't found hallucination to be a problem on GPT-4 when asking it to analyze or parse a dataset but can acknowledge it being possible (I just haven't encountered it).

I think that if we consider the accuracy rate as measured in various ways being roughly that of a human, then you're trading human mistakes for AI mistakes in exchange for dramatically lower costs and a dramatically higher speed of processing. You might even say a higher level of reasoning. In my own interactions it's been fantastic at reasoning clearly and quickly outside of complex trick questions. Most scenarios in life aren't generally full of trick questions.


I done testing here and I have seen hallucinations for all of it. It’s not that reliable.

Hallucinations are a feature, not a bug. GPT pre-training teaches the model to always produce an answer, even when it has little or not relevant training experience, and on average it does well at that. Part of the point of RLHF in ChatGPT was to teach the model not to hallucinate when it doesn't have good supporting experience encoded in it's weights. This helps but is not perfect. However, it seems like there might be a path to much less hallucinations with more RL training data. As others pointed out, humans hallucinate all the time we just have better training for what level of hallucination is appropriate given supporting evidence and context.

Are we going to have to have this discourse for every single instance of GPT doing some semi-novel harm via hallucination?

(Probably, I suppose.)

next

Legal | privacy