Hacker Read

aezart | karma None | avg karma None · 2023-06-14 00:28:27

The goal of an LLM, before RLHF, is to accurately predict what token comes next. It cannot do better than that. The perfect outcome is text identical to the training set.

Let's say your LLM can generate text with the same quality as the input 98% of the time, and the other 1% of the time, it's wrong. Each round of recursive training amplifies that error. 96% accuracy after the next round. 67% after 20 rounds.

There's no way for it to get better without more real human input.

reply

Solvency | karma 1586 | avg karma 1.38 · 2023-06-14 01:04:49

"The perfect outcome is text identical to the training set."

Huh? If the LLM was only ever spitting back identical content straight from the training set, that would be a symptom of extreme overfitting, which everyone universally agrees is a bad thing — not a perfect thing.

reply

aezart | karma None | avg karma None · 2023-06-14 09:42:11

It's not the most useful outcome for the end user, but it's the perfect outcome from the perspective of the learning algorithm. All these things care about is minimizing their loss function, where loss is deviation from the training set.

lyu07282 | karma 1427 | avg karma 2.84 · 2023-06-14 06:14:09

The amount of training data vastly exceeds the size of the model, it does not just regurgitates what it found on the internet. This ignorant trope needs to die already.

sebzim4500 | karma 5679 | avg karma 2.5 · 2023-06-14 09:05:25

Yeah but those errors will decrease the chance that the output will end up spreading across the internet, and therefore decrease the chance that it ends up in a future training set.

You can see the whole process as a very high latency reinforcement learning system.

reply

aezart | karma None | avg karma None · 2023-06-14 09:49:04

That's true but I think you're overestimating how careful people will be in their curation. There are tons of LLM-powered bots on twitter and reddit already, and no one is bothering to delete the output.

Also, the curation process counts as "real human input" itself, so I don't think it contradicts my point.

reply

facu17y | karma 19 | avg karma 0.47 · 2023-06-14 13:14:51

"98% of the time, and the other 1% of the time" Is this a strange take on "off by 1" error?