Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

The goal of an LLM, before RLHF, is to accurately predict what token comes next. It cannot do better than that. The perfect outcome is text identical to the training set.

Let's say your LLM can generate text with the same quality as the input 98% of the time, and the other 1% of the time, it's wrong. Each round of recursive training amplifies that error. 96% accuracy after the next round. 67% after 20 rounds.

There's no way for it to get better without more real human input.



view as:

"The perfect outcome is text identical to the training set."

Huh? If the LLM was only ever spitting back identical content straight from the training set, that would be a symptom of extreme overfitting, which everyone universally agrees is a bad thing — not a perfect thing.


It's not the most useful outcome for the end user, but it's the perfect outcome from the perspective of the learning algorithm. All these things care about is minimizing their loss function, where loss is deviation from the training set.

The amount of training data vastly exceeds the size of the model, it does not just regurgitates what it found on the internet. This ignorant trope needs to die already.

Yeah but those errors will decrease the chance that the output will end up spreading across the internet, and therefore decrease the chance that it ends up in a future training set.

You can see the whole process as a very high latency reinforcement learning system.


That's true but I think you're overestimating how careful people will be in their curation. There are tons of LLM-powered bots on twitter and reddit already, and no one is bothering to delete the output.

Also, the curation process counts as "real human input" itself, so I don't think it contradicts my point.


"98% of the time, and the other 1% of the time" Is this a strange take on "off by 1" error?

Legal | privacy