Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

It does have some ability to extrapolate to new problems, provided its training corpus has reasonably close coverage. It is not going to be making new scientific discoveries or insights but then neither are most people. With a sufficiently large training set I think these models can achieve human parity for a subset of language generation tasks, and be effectively of human intelligence. They nearly already have.

It doesn’t matter to me if they have “reasoning” capabilities or not if the outcome is the same.

I think we are a long ways off from AGI still.



sort by: page size:

Well one way out is if large language models don't just somehow magically turn into human level (or better) AGI at some point once enough data has been thrown at it. Then the whole debate will turn out to be pretty moot.

While I agree with you that advances will come from being able to train with less data using as yet undevised techniques, I think you are jumping to a conclusion with this particular work.

First, yes, bigger appears to be better so far. We haven't yet found the plateau. No, bigger won't solve the well known problems, but it's absolutely clear that each time we build a bigger model it is qualitatively better.

Second, it's not clear that this work is trying to build AGI, which I assume you are referring to when you say "the solution." Of all the use case for language models, building one off all the worlds scientific data like they are doing in this project is probably the most exciting to me. If all it can do is dig up relevant work for a given topic in the entire body of scientific literature, it will be revolutionary for science.


Until a language model can develop a generalized solution to a real-world phenomena, it's not even close to AGI. The current iteration of ML algorithms are useful, yes, but not intelligent.

But if it's rational and has a sense of truth, then it's AGI. Which I don't think is impossible or even unattainable within a reasonable amount of time, but we're .001% of the way there, not 50% or 75%.

These models are fascinating, but the problem 'a lot of the things this model generates lack any semantic meaning' is inherent and likely insurmountable without connecting the model to other, far more complex models that haven't been built yet.

We are at the level where our models can consistently generate blocks of text with full sentences in them that make grammatical sense. Which is pretty cool.

But the next step is being able to consistently generate full sentences that make grammatical sense and usefully convey information. And while the current models do that a lot of the time, they don't do that all of the time because they don't and can't know the difference without essentially being a different thing. Because to do that consistently, we need an "understanding what things mean" model. Which is many orders of magnitude larger and more difficult than a text generator.


they don't seem to have a theoretical upper limit. more data and more parameters seem to just keep making it more advanced. Even in ways that weren't predicted or understood. the difference between a language model that can explain a novel joke and one that can't is purely scale. So the thought is with enough scale, you eventually hit AGI

I don't think a pure language model of the sort under consideration here is heading towards AGI. I use language models extensively and the more I use them the more I tend to see them as information retrieval systems whose surprising utility derives from a combination of a lot of data and the ability to produce language. Sometimes patterns in language are sufficient to do some rudimentary reasoning but even GPT4, if pushed beyond simple patternish reasoning and its training data, reveals very quickly that it doesn't really understand anything.

I admit, its hard to use these tools every day and continue to be skeptical about AGI being around the corner. But I feel fairly confident that pure language models like this will not get there.


That’s how far current technology goes but one day someone will find the missing ingredient and get good reasoning out of it. Right now it is an impressive language model.

any approach that is so obviously alien to it is unlikely to be a good approach

Computers very frequently do not solve problems the same way humans do, so I'm not sure that's a significant point against any particular modeling technique.

Otherwise, if you're waiting for language models to understand language he same way that human minds do, you're probably waiting for AGI, not any particular breakthrough in NLP alone.


Well, yes and no ...

Any language model alone isn't going to solve natural language understanding, but some future ML full-brain model surely will achieve AGI and therefore language understanding as part of that.

What's missing from a pure language model, is any grounding in reality and ability to interact with the world to test and extend its knowledge beyond what is derivable from the corpus it was trained on. It's level of understanding is ultimately limited by the content of the training corpus, regardless of how massive that may be.

Something like GPT-3 is really just a statistical twist on Doug Lenat's Cyc; it's understanding is always going to be limited by it's own fundamental nature. Yes, one deals with language, and one with facts, but ultimately both are just large, fixed, self-referential bodies of data.

Cyc really is a great analogy, and for some reason it took decades for Lenat et al to eventually realize that regardless of how much fixed data you added to it, it was never going to be enough. A closed black box can never know what's outside of the box, although it may gamely try to tell you if you ask it.

These modern language models, GPT-3, etc, have certainly been a bit of an eye opener, and can perform some impressive feats (question answering, etc), but one shouldn't be tempted to believe that if scaled up sufficiently they'll eventually somehow transcend their own nature and become more than a language model... a one-trick pony capable of generating plausible continuations of whatever you seed it with.


There's one big weakness in all current language models that I feel holds it back. There's no way proactive way to have it be persuasive.

Weak AGI will be the first language model that is able to somehow influence the thoughts of the person communicating with it, I think that is the milestone of AGI. From my experience with GPT-Neo and OPT and using it to help write stories or make chatbots, the responses are still very reactionary. In that sense, adding more parameters helps the model give a more coherent response, but it's still a response.


That kind of ML model is pretty general. Pretraining a big model has been extended to multi-modal environments. People are training them with RL to take actions. People are applying other generative techniques to them, and all sorts of other stuff. If you just look at it as 'predict the next word token,' then it's pretty limiting, but people have already gone way beyond that. TFA talks about some interesting directions people are taking them.

A more general form of your question is whether we can get to AGI with just incremental steps from where we are today, rather than step-change way-out-of-left-field kinds of ideas. People are split on that. Personally, I think that incremental changes from today's methods are sufficient with better hardware and data, but Big New Ideas could certainly speed up progress.


I think the consensus at this point is that these models are much closer to AGI than anyone thought they could or should be, and that the delta between what we have now and AGI is smaller than it's ever been.

Anyone who tells you that these models are "just glorified text generators" is flat out wrong and hasn't bothered to do their homework. And anyone who claims they "know how it works under the hood" is making claims that all of the true experts have notably carefully avoided making.


How do you figure that we can still confidently say it’s just a language model?

It was trained on language for the primary purpose of producing text, but that’s not necessarily all it can do. The billions of nodes and parameters it contains allows it to compute ultra complicated equations. Who’s to say some subset of those nodes aren’t forming some basic primitive used for reasoning?


> can scale current models towards "true understanding" (or similar), is a total unknown atm.

Right, but it's no more known than before GPT models IMO. It's the same unknown.

I don't mean to imply these language models are not impressive. They are pretty impressive.


Even if this ends up in a plateau, that doesn't discount the fact that it's a huge step forward from what we were capable of building 5 years ago. If the next generation of language models make us believe they're superintelligent but are actually at the level of an average college student, that would still be an amazing achievement.

It is very promising. In fact, in industry there are jokes about how getting rid of linguists has helped language modeling.

Trying to understand it at some level of abstraction that humans can fit in their head has been a dead end.


I think it's quite amusing that despite being a "only" a language model that generates text by prediction, it's so good at it that it successfully tricks people into believing it's some kind of AGI that can figure out the "real" answer to anything and do any task and then when it inevitably "fails" a task, people say it's dumb.

But I do agree that for people who understand how it works, it's a bit weird not to be impressed that a language model has the ability to have such good understanding of things and such intriguing capabilities when it's fundamentally just predicting the next token.


Once it can read, has a broad vocabulary and can reason enough to synthesize information from what is given then you don’t need to train it any further. We’re at that point now. Everything going forward is just engineering, even just finding ways to increase the context length will allow these models to work with any data available. OpenAI is very publicly working on that and so is Anthropic. You can also apply some finesse and combine the model with external tools like search and databases or custom built APIs, practically everyone and their dog is experimenting with this approach. So even if no better models are made, which seems unlikely to be the case, we’ll be utilizing the current generation in all kinds of ways from here on out.

So what problems do language models solve to a human-like level or higher?

I think answering that question should be required as part of any claim that a system is an AGI or nearly there.

next

Legal | privacy