> let me put it to you this way: what is the LLM doing differently when it generates tokens that are "wrong" compared to when the tokens are "right"?
It is conditioning on latents about truth, falsity, reliability, and calibration. All of these inferred latents have been shown to exist inside LLMs, as they need to exist for LLMs to do their jobs in accurately predicting the next token. (Imagine trying to predict tokens in, say, discussions about people arguing or critiques of fictional stories, or discussing mistakes made by people, and not having latents like that!)
LLMs also model other things: for example, you can use them to predict demographic information about the authors of texts (https://arxiv.org/abs/2310.07298), even though this is something that pretty much never exists IRL, a piece of text with a demographic label like "written by a 28yo"; it is simply a latent that the LLM has learned for its usefulness, and can be tapped into. This is why a LLM can generate text that it thinks was written by a Valley girl in the 1980s, or text which is 'wrong', or text which is 'right', and this is why you see things like in Codex, they found that if the prompted code had subtle bugs, the completions tend to have subtle bugs - because the model knows there's 'good' and 'bad' code, and 'bad' code will be followed by more bad code, and so on.
This should all come as no surprise - what else did you think would happen? - but pointing out that for it to be possible, the LLM has to be inferring hidden properties of the text like the nature of its author, seems to surprise people.
> It is conditioning on latents about truth, falsity, reliability, and calibration. All of these inferred latents have been shown to exist inside LLMs, as they need to exist for LLMs to do their jobs in accurately predicting the next token.
No, it isn't, and no, they haven't [1], and no, they don't.
The only thing that "needs to exist" for an LLM to generate the next token is a whole bunch of training data containing that token, so that it can condition based on context. You can stare at your navel and claim that these higher-level concepts end up encoded in the bajillions of free parameters of the model -- and hey, maybe they do -- but that's not the same thing as "conditioning on latents". There's no explicit representation of "truth" in an LLM, just like there's no explicit representation of a dog in Stable Diffusion.
Do the thought exercise: if you trained an LLM on nothing but nonsense text, would it produce "truth"?
LLMs "hallucinate" precisely because they have no idea what truth means. It's just a weird emergent outcome that when you train them on the entire internet, they generate something close to enough to truthy, most of the time. But it's all tokens to the model.
[1] I have no idea how you could make the claim that something like a latent conceptualization of truth is "proven" to exist, given that proving any non-trivial statement true or false is basically impossible. How would you even evaluate this capability?
> are LLMs just a large way of spitting out which token it thinks is best based on scoring? aka it's guessing at best?
They are 100% exactly this. More specifically they spit out a list of "logits" aka the probability of every token (llama has a vocab size of 32000, so you'll get 32000 probabilities).
Taking the highest probability is called "greedy sampling". Often you actually want to sample from either the top 5 (top k sampling), or the ones over say 90% (top p).
If you're doing things like programming or doing multiple choice questions, you can choose to only sample from those logits - for example, if the only outputs should be "true" or "false", ignore any token that isn't "t", "tr", "true", "f", "fa", "false", etc. This makes it adhere to a schema.
> How will they ever grow to the point where they are nothing more than just really good at guessing what to regurgitate based on what it's already seen?
This is what humans do the big difference is our "weights" aren't frozen and we update them in realtime based on real world feedback. Once you loop in the real world feedback you get much better results, for example if a llm recommends a command with a syntax error, if you give it back the error message it can correct it. This is why training on synthetic data is possible.
> will they ever be trustable/perfect?
This is why I hate the word "hallucinate", it's not a hallucination, it's a "miss-prediction". Humans do the exact same thing all the time - miss remembering, misspeaking, doing fast math, etc.
In many ways LLMs are already more "trustable" than inexperienced humans. The main thing you need is some mechanism of double-checking the output, the same as we do with humans. We can "trust" them once their probability of acceptable answers is high enough.
> It’s an analogy for how LLMs work. An LLM does not know anything, it just adds tokens probabilistically based on the previous tokens
This seem a deep statement that keeps getting repeated, but it doesn't mean anything. The probabilistic model that is used to decide the next token could be arbitrarily complex, including encoding knowledge (or just asking a panel of experts).
It seems pretty self evident that the model in fact encodes knowledge, just in a very lossy way and recall is also flawed.
> But most of the training corpus was not linguistically-sound gibberish; it was actual text. There was knowledge encoded in the words. LLMs are large language models - large enough to have encoded some of the knowledge with the words.
>Some of. And they only encoded it. They didn't learn it, and they don't know it. It's just encoded in the words. It comes out sometimes in response to a prompt. Not always, not often enough to be relied on, but often enough to give users hope.
And some of the pieces of "knowledge" in the training corpora were wrong, lies, or bullshit themselves.
> This will hardly seem like a controversial opinion, but LLM are overhyped.
As the [excellent] paper points out, LLMs are complex functions that can be embedded in systems to provide plausible answers to a prompt. Here's the money sentence.
LLMs are generative mathematical models of the statistical distribution
of tokens in the vast public corpus of humangenerated text, where the tokens
in question include words, parts of words, or individual characters including
punctuation marks.
Rather than focus on the limitations of this approach to answer general queries, which are manifest, it seems more interesting to ask a different question. Under what circumstances do LLMs give answers that are reliably equivalent to or better than humans? The answer would:
1. Illuminate where we can use LLMs safely.
2. Direct work to make them better.
It's already impressive that within certain scopes ChatGPT gives very good answers, indeed better than most humans.
> I assume you mean that all the LLM can do is produce text so it's not inherently dangerous, but it's rather trivial to hook an LLM up to controls to the outside world by describing an API to it and then executing whatever "commands"
Yes, you can do that, but the result is guaranteed to be silly.
The LLM isn't conceptualizing what it reads. That was already done when the human writing it used language patterns to encode their own conceptualization as data.
Instead, the LLM takes an implicit approach to modeling that data. It finds patterns that are present in the data itself, and manipulates that text alrong those patterns.
Some of the LLM's inferred patterns align to the language structure that was intentionally used by the human writing to encode a concept into that data.
Humans look objectively at the concepts they have in mind. From that perspective, we use logic or emotion to create new concepts. If a human could attach their mind to API endpoints, there would be no need to use language in the first place. Instead of encoding concepts into intermediary data (language in text) to send to a machine, they could simply feel and do the API calls.
LLMs don't look objectively at their model. They don't have a place to store concepts. They don't feel or do any arbitrary thing.
Instead, an LLM is its model. Its only behavior is to add new text and inferred patterns to that model. By modeling a new prompt, any familiar text patterns that exist in that prompt's text will be used to organize it into the existing model. A "continuation" essentially prints that change.
When you attach that to API endpoints, the decision making process isn't real. There is no logically derived new concept to determine which API call to call. Instead, there is a collection of old concepts that were each derived logically in separate unrelated contexts, then encoded into language, and language into text. Those are just being recycled, as if their original meaning and purpose is guaranteed to apply, simply because they fit together like puzzle pieces. Even if you get the shape of them right (by following the patterns they are encoded with) there is no place in this process to introduce why, or to decide the result is nonsense and avoid it.
In short, the LLM can be made to affect the world around it, and the world can affect it back; but there is nothing in between it being affected, and it affecting the world. No logic. No intent. Only data.
> They are extremely complex statistical models doing next-token prediction. That's it - that's all. If you want a proper mental model of LLMs, you need to understand this - the thing you're doing is text prediction.
Technically correct, but that's 100% what, 0% how - when the how is what 100% matters and the what doesn't matter at all.
Those models are able to coherently complete the next word, then next, then another in such a way a very useful word sequence is likely to appear which e.g. tells me how to do stuff with pandas dataframes, with working code examples, and then is able to tweak them (no small feat as anyone who can do that can attest). The only way to do that is to have some kind of smarts doing very non-trivial computations to arrive at the next-next-next-next...-next word that makes sense within the context of previous words and words that haven't been yet generated/sampled/statistically selected.
It does not need to think in the human sense to do that; proof is by demonstration.
> If the bulk of that data happens to assert that the sky is blue, the model is likely, but not guaranteed, to finish the sentence "What color is the sky" with "Blue". That's it. That's the trick.
Yeah. A very useful trick. And fiendlishly hard to learn. Perhaps there's a lot going on behind the scenes to make the trick work?
> [...] but they certainly don't prove what OP claimed.
OP's claim was not: "LLMs know whether text is true, false, reliable, or is epistemically calibrated".
But rather: "[LLMs condition] on latents *ABOUT* truth, falsity, reliability, and calibration".
> It's also very different to ask a model to evaluate the veracity of a nonsense statement, vs. avoiding the generation of a nonsense statement [...] probably could have been done with earlier generations of classifiers
Yes. OP's point was not about generation, it was about representation (specifically conditioning on the representation of the [con]text).
Your aside about classifiers is not only very apt, it is also exactly OP's point! LLMs are implicit classifiers, and the features they classify have been shown to include those that seem necessary to effectively predict text!
> It's obvious from direct experience that they're incapable of knowing true and false in a general sense.
Yes, otherwise they would be perfect oracles, instead they're imperfect classifiers.
Of course, you could also object that LLMs don't "really" classify anything (please don't), at which point the question becomes how effective they are when used as classifiers, which is what the cited experiments investigate.
>An LLM can't tell the difference between fact and fiction, because it can't apply logic.
Not true. They can differentiate it just fine. Of course being able to tell the difference and being incentivized to communicate it are 2 different things.
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback - https://arxiv.org/abs/2305.14975
> LLMs have no understanding of the underlying reality
Not true. They are slowly gaining an understanding of reality by reverse engineering the relationships built into human languages. The only reason LLMs are getting better is because they are better modeling the world. At some point the only way to improve token prediction is to gain an understanding of the world.
> In that way, normalizing LLM-generated content is a perfect road to automated misinformation.
Not if it's fact-checked. I know in certain contexts it might be misleading, but the way I look at it, it's like word-play: the AI gives you several nonsense ideas until one of them is right. In this example, I was delighted to see ChatGPT talk about some Vim functionalities that are relevant to the link.
> through various kinds of data abstraction, reasoning by analogy, and other techniques similar to what humans do.
No, that's exactly not how LLMs work. They are extremely good at predicting what sentences resemble the sentences in their training data and creating those. That's all.
People are getting tripped up because they are seeing legitimate intelligence in the output from these systems -- but that intelligence was in the people who wrote the texts that it was trained with, not in the LLM.
> but there's nothing a machine puts into this, there's nothing being transmitted
That depends on your prompt. You can make it longer if you have more to say. It's a conditional language model, you can select the place it starts from and how to go. But you can't blame it for bad prompts.
Instead of seeing LLM as "nothing there, just token probabilities", I'd rather see it as a distillation of human culture. It's like a mirror house with infinite reflexions and complexities. A place to contemplate, a microscope where you can study something in detail, a simulator where you can deploy experiments.
> The assumption that most people make is that these models can answer questions or chat with you, but in reality all they can do is take some text you provide as input and guess what the next word (or more accurately, the next token) is going to be.
These two things cannot be compared or contrasted. It's very common to see people write something like "LLMs don't actually do <thing they obviously actually do>, they just do <dismissive description of the same thing>."
Typically, like here, the dismissive description just ignores the problem of why it manages to write complete novel sentences when it's only "guessing" subword tokens.
> An LLM is just a reflection of the text that humans write, and humans seem very far off from having world models and reasoning that accurately reflect reality
The original sin of LLMs is that they are trained to imitate human language output.
Passing the Turing test isn't necessarily a good thing; it means that we have trained machines to imitate humans (including biases, errors, and other undesirable qualities) to the extent that they can deceptively pose as humans.
This isn’t true. You can train LLMs entirely on synthetic data and get strong results. [0]
> If new languages, engines etc pop up it cannot synthesize new forms of coding without that code having existed in the first place.
You can describe the semantics to a LLM, have it generate code, tell it what went wrong (i.e. with compiler feedback), and then train on that. For an example of this workflow in a different context, see [1].
> And most importantly, it cannot fundamentally rationalize about what code does or how it functions.
Most competent LLMs can trivially describe what some code does and speculate on the reasoning behind it.
I don’t disagree that they’re flawed and imperfect, but I also do not think this is an unassailable state of affairs. They’re only going to get better from here.
> If you have a model that can read text and extract its meaning
That model doesn't exist.
We have models that can read text and then probabilistically create a summary that's the most likely to match the original.
But none of these models "extract meaning" in a way that can be used for fact checking purposes in the way you describe.
What you're describing is basically a world model, and while there's a few folks claiming these LLMs are showing the glimmers of building world models, it's mostly speculation.
It is conditioning on latents about truth, falsity, reliability, and calibration. All of these inferred latents have been shown to exist inside LLMs, as they need to exist for LLMs to do their jobs in accurately predicting the next token. (Imagine trying to predict tokens in, say, discussions about people arguing or critiques of fictional stories, or discussing mistakes made by people, and not having latents like that!)
LLMs also model other things: for example, you can use them to predict demographic information about the authors of texts (https://arxiv.org/abs/2310.07298), even though this is something that pretty much never exists IRL, a piece of text with a demographic label like "written by a 28yo"; it is simply a latent that the LLM has learned for its usefulness, and can be tapped into. This is why a LLM can generate text that it thinks was written by a Valley girl in the 1980s, or text which is 'wrong', or text which is 'right', and this is why you see things like in Codex, they found that if the prompted code had subtle bugs, the completions tend to have subtle bugs - because the model knows there's 'good' and 'bad' code, and 'bad' code will be followed by more bad code, and so on.
This should all come as no surprise - what else did you think would happen? - but pointing out that for it to be possible, the LLM has to be inferring hidden properties of the text like the nature of its author, seems to surprise people.
reply