I feel like the big language models have proved this style of learning a language is the wrong approach.
I learnt Japanese; I studied it for 4 years and spent a year in japan.
You know what worked?
Lots of examples of people using particles.
What did not work?
Text books explaining what the particles do.
A grammatical study of particles is only useful after you’ve gained an understanding of when you should use them from shed loads of examples.
It helps you refine specific fine detail points of when to use them technically, and in formal writing.
For early learning, I posit it’s next to useless.
Language is not a well designed programming language full of orthogonal concepts.
This has long been an argument, but language models reallly nail down the fact that a probabilistic approach to “similar to existing examples” approach to language is categorically superior to attempting to construct semantically correct statements from “rules”.
It is difficult to see an argument that the output of a language model is not derived from the language model, other than people would prefer it wasn't.
But there isn't such a thing as a raw model, is it? In order to receive anything from a language model it has to 'learn' some objective. And this objective has to be imposed from above.
Yeah the issue is you can generate data, but it won’t be good data. Training over random strings won’t make you learn language, but it’s technically data.
What's really interesting is that these models are using some non-trivial portion of all easily accessible human writing -- yet humans learn language really well with significantly less input data. What's missing in the field to replicate human performance in learning?
Humans use language to accomplish tasks in their environment - establishing relationships, making deals, coaxing others, etc. By contrast, all neural language models do is predict the next word as a function of the previous word. So far, these language models have nothing at all to do with language learning. They're only valuable insofar as they advance downstream engineering tasks like machine translation.
Yeah, that's why just updating the weights on the models such as they are doesn't work. But they're right that it's desirable to have some sort of online learning, whether on top of a frozen language model, or through some not yet invented way to do it end to end.
I don't know where this idea comes from that we can get more from language models then what we put inside. Thinking we can process any amount of data and get a competent surrogate mind out of it borders on magical thinking.
reply