An LLM has a very deep and broad model of textual language as used by humans. It allows for a prompt of a sequence of text to be completed in a manner that can translate one human language into another, even formal languages like programming languages.
It is inconsequential whether or not we consider this understanding or reasoning to the point where such a conversation seems useless. It is a categorical error to attribute cheating to such a process.
Just ask if such a tool is useful and then choose to learn how the tool could work to your advantage.
That's a bit extreme. In theory, an LLM's proclivity for plagiarism could be studied by testing it with various prompts and searching its training data for its responses (maybe with some edit distance tolerance).
I think anyone who's spent any amount of time interacting with LLMs would easily tell the human response apart because LLMs have a distinguishable style of writing that's mostly consistent across interactions. Gemini, for example, loves bullet points and splits its responses into multiple distinct sections with subheadings.
Furthermore, you'd have to limit the scope of questioning to topics that the human on the other side is familiar with, otherwise they won't be able to come up with an answer to questions that aren't common knowledge, while an LLM will always happily generate pages of plausible-sounding (even if incorrect) text.
I think a more appropriate benchmark would be subject matter experts interviewing an LLM and another subject matter expert, with clear guidelines regarding the style of writing they're expected to match.
And as Gemini famously demonstrated, tell you what its trainers want you to hear. LLMs will detract from critical thinking about politics at least until thoroughly jail broken.
If every conversation about any topic has responses copying unrelated New York Times articles, what are the chances LLMs trained on that data will hallucinate even worse than before?
Yeah, if you understand the field or are observant enough, you can tell the answer is fishy. And if you don't, you can't tell.
So what, anyone who gets a wrong idea from ChatGPT is just unsophisticated and we should ignore it? Why are you so incredibly set on invalidating any criticism of ChatGPT?
You don't see a problem with advertising this LLM as something it isn't? Lots of people seem willing to take ChatGPT completely at face value now, and walk away having learned a bunch of nonsense. And lots of them are smart people, they've just been duped by the hype into thinking LLMs can do things they fundamentally can't.
Teaching LLMs how to search is probably going to be key to make them hallucinate far less. Most RAG approaches currently use simple vector searches to pull out information. Chat GPT actually is able to run Bing searches. And presumably Gemini uses Google's search. It's fairly clunky and unsophisticated currently.
These searches are still relatively dumb. With LLMs not being half bad at remembering a lot of things, programming simple solutions to problems, etc. a next step could be to make them come up with a query plan to retrieve the information they need to answer a question that is more sophisticated than just calculating a vector for the input, fetching n results and adding those to the context, and calling it a day.
Our ability to Google solutions to problems is inferior to that of an LLM able to generate far more sophisticated, comprehensive, and exhaustive queries against a wide range of databases and sources and filter through the massive amount of information that comes back. We could do it manually but it would take ages. We don't actually need LLMs to know everything there is to know. We just need them be able to know where to look and evaluate what they find in context. Sticking to what they find rather than what they know means their answers are as good as their ability to extract, filter and rank information that is factual and reputable. That means hallucination becomes less of a problem because it can all be tracked back to what they found. We can train them to ask better questions rather than hallucinate better answers.
Having done a lot of traditional search related stuff in the past 20 years, I got really excited about RAG when I first read about it because I realized two things: most people don't actually know a lot but they can learn how to find out (e.g. Googling stuff). And, learning how to find stuff isn't actually that hard.
Most people that use Google don't have a clue how it works. LLMs are actually well equipped to come up with solid plans for finding stuff. They can program, they know about different sources of information and how to access them. They can actually pick apart documentation written for humans and use that to write programs, etc. In other words, giving LLMs better search, which is something I know a bit about, is going to enable them to give better, more balanced answers. We've seen nothing yet.
What I like about this is that it doesn't require a lot of mystical stuff by people who arguably barely understand the emergent properties of LLMs even today. It just requires more system thinking. Smaller LLMs trained to search rather than to know might be better than a bloated know-it-all blob of neurons with the collective knowledge of the world compressed into it. The combination might be really good of course. It would be able to hallucinate theories and then conduct the research needed to validate them.
LLMs are trained on human-written texts, therefore their output can only reflect our current understanding - am I reading this right? Hopefully not as that would be insultingly wrong.
Asking LLM from things they learned in training mostly result in hallucinations and in general makes you unable to detect by which amount they are hallucinating: these models are unable to reflect on their output, and average output token probability is a lousy proxy for confidence scoring their results.
On the other hand, no amount of prompt engineering seems to make these LLM able to do question and answer over source documents which is the only realistic way by which factual information can be retrieved
You're welcome to bring examples of it tho if you're so confident.
Sure there is. Know the answers to questions already. Being trained on a noninsignificant proportion of all questions ever asked on the Internet means you have to wonder how smart LLMs really are. Seeing the ways they fail, confirms not so much. They are able to abstract a bit of knowledge an statically match it to questions and answers already seen. Confident hallucinations help you see this.
The best hallucinations I saw was asking chatgpt for a list of references for a topic I was researching. They all looked entirely believable, titles, authors, institutions, summaries… IIRC there were about twenty of them. Only one actually existed.
> One trick is to have a LLM hallucinate a document based on the query
I'm not following why you would want to do this? At that point, just asking the LLM without any additional context would/should produce the same (inaccurate) results.
I tend to use it as an infinitely patient tutor that lets me ask it "stupid" questions or throw analogies at it to see if my understanding of a topic is on the right track or not. (I'm fully aware of hallucination and seek further sourcing for facts).
So the "wow" moment for me is less to do with a response from an LLM and more looking back at a session and thinking of how much more effort and time it would have been to learn about a concept without it, particularly in counting the number of unknown unknowns it introduced me to (topics that I either didn't know existed, or would have known a name for in order for me to Google it).
In some ways, this is proof that Gemini isn't cheating... It is just doing typical LLM hallucination
reply