Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

I think this is a good thing. In the past you would remind people, hey, after you find your wiki article, "please" go verify. But wiki was "good" enough most of the time, that people found verification to be often time redundant.

But now with LLMs everywhere, people will realize it is necessary to verify.



sort by: page size:

There's the old adage of "trust, but verify" with LLM's I'm feeling it more like "Acknowledge, but verify, and verify again". It has certainly pointed me in the right direction faster vs google "here's some SEO stuff to sort through" :)

Just like how it is hard to fact-check Wikipedia now that it's used a reference. A thought came to me - perhaps it's Wikipedia that should be worried that it'll be supplanted by LLMs.

Slightly off-topic/meta but this debate reminds me of one people had 20 years ago: Should you trust stuff you read on Wikipedia or not?

In the beginning people were skeptical but over time, as Wikipedia matured, the answer has become (I think): Don't blindly trust what you read on Wikipedia but in most cases it's sufficiently accurate as a starting point for further investigation. In fact, I would argue people do trust Wikipedia to a rather high degree these days, sometimes without questioning. Or at least I know I do, whether I want to or not, because I'm so used to Wikipedia being correct.

I'm wondering what this means for the future of LLMs: Will we also start trusting them more and more?


"For example LLMs can be used to provide surveys of a topic area, and even book recommendations, tailored to a specific learner’s need. They have, famously, a tendency to “hallucinate,” a generous term of art for “fabricating bullshit.” But in just a few months, this tendency has found a curious guardrail: the LLM can browse the web, and in doing so, provide citations and links that you can check yourself. So where before you might have been led toward publications that didn’t exist, you can now prompt the LLM to ensure it’s giving you proof."

Who wants to browser the Web with a chat bot? Who wants to dig through garbage, verify it, and provide free labor for AI companies? What are these "citations"? Is the "proof" even in the Web?


The conversations about people being misled by LLM's remind me of when the Internet was new (not safe!), when Wikipedia was new (not safe!) and social media was new (still not safe!)

And they're right, it's not safe! Yes, people will certainly be misled. The Internet is not safe for gullible people, and LLM's are very gullible too.

With some work, eventually they might get LLM's to be about as accurate as Wikipedia. People will likely trust it too much, but the same is true of Wikipedia.

I think it's best to treat LLM's as a fairly accurate hint provider. A source of good hints can be a very useful component of a larger system, if there's something else doing the vetting.

But if you want to know whether something is true, you need some other way of checking it. An LLM cannot check anything for you - that's up to you. If you have no way of checking its hints, you're in trouble.


We need search to be able to fact check the LLMs, but this might become extremely difficult once all the content is written by them…

Find me good Internet posts that use verification and general reasoning. They are rare. The Internet posts I read suck at verification and general reasoning.

Therefore LLMs will suck at verification and general reasoning until we refine or augment our datasets.


If you'd read the aticle you might have noticed that generating answers with the LLM is very much part of the fact-checking process.

Wasn't this expected? Now there is broader proof of the issues of LLMs as trusted sources of information.

Fact checking is important but a lot of use cases don't need fact checking; imagine LLM as an assistant to write fiction or poetry, or imagine even some kind of a rubber-duck debugging partner or brainstorming partner where the user supplies all pertinent facts.

I don't see LLMs replacing traditional Search, but they still have plenty of other use cases.


In fact, I’d argue that citation makes LLM better. Kind of a “think carefully” indicator. When LLMs are able to verify those citations independently it’s going to level up again by skyrocketing the objective truthiness.

There is another phenomenon at play here imo.

If it gives you what you want, or something that looks like what you want or need, it gets harder and harder to say no and not use it.

A few times I was tempted to just go it blindly and I’m glad I didn’t.

TL;DR: verifying LLM output gets tiring.


One could argue that whereas once it was necessary to verify the source, now it is necessary to verify not just the source but also the LLM derivation of it, (which may be subtly mangled) - and the source may no longer be readily apparent.

This reminds me of movies shot in early times of the internet. We were warned that information on the internet could be inaccurate or falsified.

We found solutions to minimize wrong information for example we built and maintain Wikipedia.

LLMs will also come to a point where we can work with them comfortably. Maybe we will ask a council of various LLMs before taking an answer for granted, just like we would surf a couple of websites.


What happens to the quality of primary sources of information used by LLMS in this new age? Eg less traffic to Wikipedia, Stackoverflow can't be a good thing.

The reason verifiability is important is because humans can be incentivized to be truthful and factual. We know we lie, but we also know we can produce verifiable information, and we prefer this to lies, so when it matters, we make the cost to lying high enough that we can reasonably expect that they will not try to deceive (for example by committing perjury, or fabricating research data). We know it still happens, but it’s not widespread and we can adjust the rules, definitions and cost to adapt.

An LLM does not have such real world limitations. It will hallucinate nonstop and then create layers of gaslighting explanations to its hallucinations. The problem is that you absolutely must be a domain expert at the LLM’s topic or always go find the facts elsewhere to verify (then why use an LLM?).

So a company like Google using an LLM, is not providing information, it’s doing the opposite. It is making it more difficult and time consuming to find information. But it is then hiding their responsibility behind the model. “We didn’t present bad info, our model did, we’re sorry it told you to turn your recipe into poison…models amirite?”

A human doing that could likely face some consequences.


The search LLMs are good at synthesizing answers that don’t appear anywhere on the net. But they also hallucinate answers often. So to get reliable results, one needs to fact-check them. Otherwise, the risk of being misled is high. The fact-checking isn’t much faster than just looking up different bits of information and synthesizing the answer oneself.

There are cases where LLMs make life a lot easier for people, but I am not convinced about whether search can be made easier by the way Sydney and Bard do it.

If they suggested alternative search queries and summarized websites for their search result excerpts, the LLMs would speed up search a lot. They could also synthesize some content quality metrics for each search result and highlight ones with biased reasoning, political influences, SEO games, and so on.


LLMs are impressively good at confidently stating false information as fact though. They use niche terminology from a field, cite made-up sources and events, and speak to the layman as convincingly knowledgable on a subject as anyone else who's actually an expert.

People are trusting LLM output more than they should be. And search engines that people have historically used to find information are trying to replace results with LLM output. Most people don't know how LLMs work, or how their search engine is getting the information it's telling them. Many people won't be able to tell the difference between the scraped web snippets Google has shown for years versus a response from an LLM.

It's not even an occasional bug with LLMs, it's practically the rule. They don't know anything so they'll never say "I don't know" or give any indication of when something they say is trustworthy or not.


Even the latest commercial LLMs are happy to confidently bullshit about what they think is in published research even if they provide citations. Often the citations themselves are slightly corrupted. I actually verify each LLM claim so I know this is happening a lot. Occasionally they are complete fabrications. It really varies by research topic. Its really bad in esoteric research areas. They even acknowledge the paper was actually about something else if you call them out on it. What a disaster. LLMs are still useful for information retrieval and exploration as long as you understand you are having a conversation with a habitual liar / expert beginner and adjust your prompts and expectations accordingly.
next

Legal | privacy