Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
Scaling will never get us to AGI (garymarcus.substack.com) similar stories update story
6 points by isaacfrond | karma 17409 | avg karma 5.54 2024-04-11 10:24:12 | hide | past | favorite | 122 comments



view as:

This is why driverless cars are still just demos, and why LLMs will never be reliable.

It hard to build something that is yet to be even logically defined.

Hoping that AGI will somehow just "emerge" from an inert box of silicone switches (aka a "computer" as we currently know it) is the stuff of movie fantasy ... or perhaps religion.


> ...is the stuff of movie fantasy ... or perhaps religion.

IMHO, something like "movie fantasy" constitutes the core belief of a great many software engineers and other so-called rational people in the tech space.


Driverless cars are here, without liability issues they would already be widespread. But scaling isn't what got them there. It was tons of labeled data, both from rules based implementations and shadowing human drivers.

The ability to generalize completely outside training data isn't that common among humans IME. That is a high bar. How many of us have done so without at least drawing an analogy to some other experience we have had? Truly unique thinkers aren't that commonplace. I myself could have probably just asked an LLM what I would say in this post and gotten pretty close...


> Driverless cars are here, without liability issues they would already be widespread

In other words, they're in a lab somewhere undergoing perpetual test runs so they don't kill people, so not generally available.


You can order driverless taxis right now in some cities:

https://www.axios.com/2023/08/29/cities-testing-self-driving...


That article is discussing Cruise. Since it was published, a Cruise car dragged a pedestrian 20 ft, it came out that they need remote intervention every few miles, the CEO resigned, and they stopped doing rides anywhere.

Waymo remains available to my knowledge.


However, they require constant human monitoring and interventions from those humans every few miles to work properly - and it's not clear that the need for those interventions will be solved with more training data.

https://www.theverge.com/23948708/cruise-robotaxi-suspension...


> However, they require constant human monitoring and interventions from those humans every few miles to work properly

It's a little-known fact that AI means "Actually, Indians."

https://boingboing.net/2024/04/03/amazons-ai-powered-just-wa...


No, I think he meant they are literally here. I see one everyday.

> But scaling isn't what got them there. It was tons of labeled data, both from rules based implementations and shadowing human drivers.

Isn't the second half of that saying that scaling up the dataset was what got them where they are?

When I hear about "scaling up" models, I think there are two parts of it: 1) use the same architecture, but bigger 2) make the training data bigger to make use of all those new parameters.

So when I hear about something that's not scaling, it would require some sort of fundamental change to architecture or algorithms.


> Driverless cars are here, without liability issues they would already be widespread.

In the sense that if you were allowed to run over anyone without legal consequences lots of tech enthusiasts would sign up for this experience?


General intelligence somehow just emerged from dirt and water + @.

And the Sun just somehow emerged from a cloud of hydrogen gas.

The obvious flaw in this line of thinking (as related to AGI) is that we/humans can create/engineer the same without any real understanding of the underlying mechanisms involved.


Evolution is an incredibly dumb, very parallel search over long timescales and happened to turn wet blobs of carbon soup into brains because that's a useful (but not necessary) tool for its primary optimization goal. But that doesn't make carbon soup magical. There's no fundamental physics that privileges information-processing on carbon atoms. We don't have the time-scales of evolution, but we can optimize so much harder on a single goal than it can.

So I don't see how it's a movie fantasy any more than bottling up stars is (well, ICBMs do deliver bottled sunshine... uh, the analogy is going to far here). Anyway, the point is that while brains and intelligence are complicated systems there isn't anything at all known that says it's fundamentally impossible to replicate their functionality or something in the general category. And scaling will be a necessary but perhaps not sufficient component of that, just because they're going to be complex systems.


The question is if other animals even do have the kind of intelligence that most people call AGI. It seems that this kind of adaptive intelligence may be unique to mammals (and perhaps a few other outliers, like the octopus), with the vast majority of life being limited to inborn reflexes, immitating their conspecifics, and trial and error.

This is actually a great perspective to have. The idea that there is no fundamental law of physics that should prevent us from replicating or exceeding the functionality of the human brain through artificial means makes this a question of when, not if.

The idea that there is no fundamental law of physics that should prevent us ...

Maybe ... someday. But right now, we don't even fully understand the physics that makes a human brain work. Every time someone starts investigating, they uncover new complexity.


The understanding is not a prerequisite to build one though. Evolution didn't need it. And all the functionality that already emerges from ANNs doesn't need it either. You only need to understand the optimizer, not the result. Of course understanding the result is desirable to steer the optimization process better. But if all you care for is getting any result, even potentially undesirable ones, then building bigger optimizers appears to work.

Evolution didn't need it.

Yes, it only took a few billion years of trial and error. And it didn't manage to do it with inanimate objects either.


That's an argument about complexity. And "organic" simply means based on molecules with a carbon backbone. They're versastile, but that does not give them any known information processing advantage over silicon.

If analog processing were relevant (it most likely isn't) then Artifical Neural Networks could also be implemented on analog silicon circuits.

As for the complexity, well, the systems are getting more complex. Straight lines on log charts. That's what scaling is about. Getting there, to that complexity.

So I'm just not seeing any knockout argument from you. We can't predict the future with certainty. But the factors you present do not appear to be fundamental blockers on the possibility. You're pointing at the lack of an existence proof... which is always the case before a protoype.

It sounds like you're basically abandoning forward-thinking and will only acknowledge AGI when it hits you over the head. A sign of intelligence is also the ability to plan for an unseen future.


Can you clarify what you mean by "no information processing advantage?" Does increased speed or memory capacity provide such an advantage? Or would you also claim that 2024 silicon has no advantage over 1994 silicon despite several orders of magnitude more speed & memory?

Since we're looking towards the future and asking whether there is an advantage of the basic materials (organic vs. inoraganic) I'm talking about the physical limits of those substrates, whether there's anything special about them that allows one to process information in a way that the other can't. I.e. how many bits can be stored in a cubic centimeter of nano-structured silicon compared to a cube of carbon, oxygen, hydrogen and other organic chemistry arranged into (for example) neurons. How many TFlop/s can be fit into such a cube in principle etc. etc. Or if there are some other physical processes relevant to information processing that make carbon special. Those fundamental limits have not changed over time, the physics are the same. All that changes are how much use engineering makes of those possibilities.

It seems evolution developed intelligence as a way to for organisms to react and move through 3D space. An advantage was gained by organisms who can "understand" and predict, but biology just reused the same hardware for location, moving and modeling the 3D word for more abstract processes like thinking and pattern recognition.

So evolution came up with solutions for surviving on planet Earth, which isn't necessarily the same as general problem solving, even though there are significant overlaps. Just the $.02 of a layperson.


But that doesn't make carbon soup magical.

The equivalent of a human brain will just "emerging" from inert silicon switches without any real understanding of how or why --- that's PFM (Pure Friggin' Magic). There is no reason to believe it is even possible.

It's the modern day version of alchemy --- trying to create a fantastical result from a chemical reaction before it was understood that nuclear physics is the mechanism required. And even with this understanding, we have yet to succeed in turning lead into gold.


"It's the modern day version of alchemy --- trying to create a fantastical result from a chemical reaction before it was understood that nuclear physics is the mechanism required. And even with this understanding, we have yet to succeed at turning lead into gold in any practical way."

Well, sure. But then, we never observed nature turning lead to gold in any cheap way, whereas we do observe nature running complex intelligence on relatively cheap hardware (our brains).

There's a pretty big difference between trying to do something never Nature did (alchemy), versus trying to replicate in a kinda-parallel kind of way something we've observed Nature doing (intelligence).


Do you agree that AGI can emerge from a bunch of haphazardly connected Neurons, using about 700MiB of procedurally generated initialization data (human DNA) and employing 1-2 decades of low-bandwidth, low-power (~25W) training?

You might even agree that not a single directed thought went into the design of that whole system.

Having established that, it seems laughable to me that it would be an "impossible fantasy" to replicate this with purpose-built silicon.

Sure, our current whole-system understanding could still be drastically improved, and the sheer scale of the reference implementation (human connectome) is still intimidating even compared to our most advanced chips, but I have absolutely zero doubt that AGI is just a matter of time (and continuous gradual progress).

If a lack of understanding did not stop nature, why should it stop us? :P

To me, all the arguments against AGI appear unconvincing, motivated by religion/faith, or based on definitional sophistry.

But I'm very open to have that view changed...


Well put. I’ve wonder if some of those on the other side of the argument are actually, perhaps on some subconscious level, proponents of mind-body dualism: https://en.wikipedia.org/wiki/Mind%E2%80%93body_dualism

To me, all the arguments against AGI appear unconvincing, motivated by religion/faith, or based on definitional sophistry.

Religion is belief in the unknown without reason or logic.

Do you know of any inanimate object that is truly "intelligent"?

Without ever seeing a single working example, you "believe" and have "faith" that it is possible. By so doing, you are practicing religion --- not science.


That's like complaining about artificial, non-organic flight being a fantasy before the wright brothers. Which would not have been entirely wrong, but hardly a knockout argument as we now know with certainty.

There is a vast difference between "engineering challenges have not yet been proven moot by a working prototype" and "precluded by physics or mathematics".

The (in)animate distinction is more akin to net-positive fusion in a man-made vessel vs. in a gravity well rather than obeys the 2nd law of thermodynamics vs. perpetuum mobile.


That's like complaining about artificial, non-organic flight being a fantasy before the wright brothers.

Nope.

Before the Wright brothers, we knew it was scientifically possible to suspend objects heaver than air is an air current. For example, kites and balloons.

The only examples we have of "intelligence" are organic in nature.

Since we have no examples to the contrary; for all we know, "life" and "intelligence" could be somehow inter-related. And we don't currently know how to engineer either one.


> The only examples we have of "intelligence" are organic in nature.

High-abstraction category-words such as "intelligence" are not fundamental properties of nature. They're made by man. Which means we currently happen to define intelligence in a way that we primarily observe in organic objects. So there's some circular reasoning in your argument. If we look at all the more fundamental building blocks such as information-processing, memory, adaptive algorithms, manipulating environments then we know all of them already are implementable in non-organic systems.

It is not a blind belief decoupled from reality. It is an argument based on the observation that the building-blocks we know about are there and they have not been put together yet due to complexity. There also is the observation that the "putting together" process has been following a trajectory that results in more and more capabilities (chess, go, partial information games, simulated environments, vision, language, art, programming, ...), i.e. there's extrapolation based on things that are already observable. Unless you can point at some lower-level piece that is only available in organic systems. Or why only organic systems should be capable of composing the pieces into a larger whole. I am not aware of any such limiting factor.

Just like "suspending" objects in air was possible and self-propelled machines were possible, even if not self-propelled flight had not yet been done at that time.


This is very nonconvincing to me, because you dance around the core of your own argument, refusing to state it plainly: Why would it be necessary for everything intelligent to consist of animated matter? You don't even offer a plausible hypothesis.

To me the null hypothesis is that non-organic intelligence is possible --at worst (!!)-- simply by emulating exactly what an intelligent organic entity does (but it seems obvious to me that this approach is likely to be wasteful and inefficient).


Why would it be necessary for everything intelligent to consist of animated matter?

It may not be necessary --- but at this point in time, we really don't know enough to state this as a fact.

What we do know is that our only working examples of "intelligence" are all organic and mostly analog --- not digital. Suggesting that "intelligence" is possible without "life" --- is unsupported by any evidence and a pure leap of faith at this time.


First, "It doesn't exist already" is an extremely poor argument against the technical feasibility of really anything :P

> Suggesting that real "intelligence" is possible without "life" --- is unsupported by any evidence and is a pure leap of faith at this time.

Strongly disagree on this; If you accept that brains effect intelligence by selfinteraction accoring to physical laws, then "computability" directly and inevitably follows. And that implies artificial brains are feasible...


And that implies artificial brains are feasible...

Given enough time, money, effort and energy, anything is *feasible* --- even converting lead into gold.

Expecting it to just *emerge* somehow on it's own without any real design or plan or understanding of the scope involved --- that goes way beyond *feasible* or *practical* and heads straight for *magical thinking*.


Is anyone expecting it to "just emerge somehow"?

This seems strawmanny. There has been an absolute ton of research into figuring out what kind of design/scope is necessary. More planning and research is happening all the time, and architecture is specifically being changed with a mind towards specific new functionalities. It's not haphazard; it's planned. Yes, there is some room for 'emergence' or learning, but that doesn't mean there isn't also a lot of structure and planning.

The picture you present of NN research is so far off from what I see actually being done in this field that I'm a little boggled. Are you sure you're in touch with what's actually happening in the field?


> Do you know of any inanimate object that is truly "intelligent"?

So, your argument is that by making something intelligent we would necessarily also make it animate?

That's reasonable, but doesn't impact the plausibility of AGI. “Artificial” is not “inanimate”.


> To me, all the arguments against AGI appear unconvincing, motivated by religion/faith, or based on definitional sophistry.

Correction. I'm atheist. I don't see AGI for the future. Same reasons as the poster who responded to you below: evolution may not have had a design, but we know it works well on animated creatures.

We have yet to see any examples of inanimate objects exhibiting any sign of intelligence, or even instinct.

What we do know is that, even after billions of years of evolution, not a single rock has evolved to exhibit a sign of intelligence.

You're saying that with an intelligent hand directing the process, we can do better. I understand the argument, and I concede that it is a reasonable argument to make, I'm just unconvinced by it.


> We have yet to see any examples of inanimate objects exhibiting any sign of intelligence, or even instinct.

Can you give me a clear, minimal definition for both "intelligence" and "instinct"?

Because to me, instinct seems to be already achieved by existing inanimate control systems for decades now, and "decent understanding of natural human language" (as achieved by todays LLMs) is more than enough to count as intelligent for me.

If we just started simulating neurons in a brain exactly, what would prevent us from achieving inorganic intelligence in your view?


I don't really want to get involved in a discussion on where the fine dividing line is between intelligence and non-intelligence. We are fine determining intelligence at the extremes, but not in the large shades of grey in-between.

Most people (say, 999999 out of every 1000000) will consider a rock unintelligent and Stephen hawking to be intelligent. You can't call them wrong because "they cannot define intelligence".

> If we just started simulating neurons in a brain exactly, what would prevent us from achieving inorganic intelligence in your view?

Nothing. But the word "If" is doing all the heavy lifting in that argument. I mean, we aren't doing that at the moment, are we? It's not clear that we might ever discover exactly how the collection of neurons we call a brain "exactly works".

IOW, if the human brain was simpler, we'd be too simple to understand it: this may already be the case!

And the evidence for "we may not figure out how the brain works exactly enough to clone the mechanism it uses" is a lot larger than "we might figure out how the brain works well enough to clone the mechanism it uses."

This is why I remain unconvinced. Even though I think your position is a reasonable one to hold, the opposite position is, IMHO, just as reasonable. It's got nothing to do with religion or superstition.


"What we do know is that, even after billions of years of evolution, not a single rock has evolved to exhibit a sign of intelligence."

Is it even appropriate to apply "after billions of years of evolution" to rocks? Rocks don't evolve. Evolution can't act on them.

And, we aren't generally looking to "evolve" AI, so it's kinda a moot point there, too.

What is it about animation that, to you, is important for intelligence? Presumably it's the training data + agency, but.. are these not possible without physical-world "animation"?

But also: how come AI can't be animated?


It’s not a hope to expect machines to match or exceed the human level of intelligence, because we demonstrate human level intelligent machines by virtue of existence, and the physical limits of computation far exceed those of our specific biology.

AKA, we know it’s possible for atomic systems to be as intelligent as we are, because we are. We suspect the limit is far beyond us because computers have dramatically faster processing, essentially perfect memory systems, etc


I like how he references someone who references back to himself.

Citogenesis. It's a hell of a drug.

I encourage Gary Marcus and scientists with similar views to create a functioning model as proof of their theories. Given his advocacy for "symbolic AI" it seems more valuable for them to demonstrate their concepts in practice rather than producing countless articles on the subject.

To cite results of an arxiv preprint as “proof” that none of these LLM technologies will ever work is so disingenuous. Of course these beasts are data inefficient, but to imply that this means all of the progress is smoke and mirrors is such a stretch. I have a hard time taking any of Marcus’ arguments seriously

Gary Marcus is the Glenn Greenwald of AI. Doesn't mean he is wrong, just that he's always spitting venom like cut snake in his proclamations.

You don't like OpenAI Gary, we get it.


It is funny when you know neither A or B in A is the B of C claims.

I've also never seen a cut snake spit venom!

These posts from him are so tiring. If ChatGPT-5 was a high fidelity 100% accurate simulation of a team of top tier grad students, he'd be writing that we will soon run out of grad student brains to scan and results will taper off.

It would an amazing tool, changing the world but the post would be the same 'nothing to see here folks' post as this.


> It would an amazing tool, changing the world but the post would be the same 'nothing to see here folks' post as this.

It will change the world for the worse. We don't need more progress and more technology. What's the point of living if we're going to give all the funnest work to AI? Idiocy.


Maybe amazing was the wrong word. I just meant regardless of whether AI tools are getting better or worse - regardless of whether things are looking to go on improving forever or tapering off - he will write the same post.

That makes the posts devoid of meaning. It's gist is unaffected by the things happening in the real world. Will always be "I told you this wouldn't work".


Why would you think research is fun if we don’t need more progress? If one were to take your comment seriously they would conclude we must fire all those people working on fun problems.

They can work on any problems they wish. Just not at the expense of the biosphere :)

Ok, but please don't post low-quality flame-style comments to HN. It only makes things worse.

https://news.ycombinator.com/newsguidelines.html


I was at an AI meetup, and I talked to someone who was in charge of the emerging technology division at a VC firm. I asked him why focus so hard on AGI, when AI tools are already looking quite impressive and are a much clearer area to focus investment on. His answer was that AGI was "easy" which I laughed at (in a good natured way) and tried to get him to elaborate on, at which point he started to get uncomfortable and made an excuse to go talk to someone else.

This same fellow was big on autonomous swarms for problem solving, but when I asked him what the problem autonomous swarms were supposed to solve that you couldn't solve more easily and quickly by a LLM talking to itself, he didn't have an answer.


Everyone is building none sense betting that openai’s next real innovation will validate their idea and they’ll be ahead of the curve.

That’s all it really is


You make it sound silly, but it does seem to make good business sense to me. I'd argue that it seems like they learned from The Bitter Lesson[0] and instead of trying to manually solve things with today's technology, are relying on the exponential improvement in general purpose AI and building something that would utilize that.

On a somewhat related note, I'm reminded of the game Crysis, which was developed with a custom engine (CryEngine 2) which was famously too demanding to run at high settings on then-existing hardware (in 2007). They bet on the likes of Nvidia to continue rapidly improving the tech, and they were absolutely right, as it was a massive success.

[0] http://www.incompleteideas.net/IncIdeas/BitterLesson.html


Yeah it’s a real valid business strategy.

It’s also silly.


I'd also add that what Crysis did was pretty typical at the time. It was an era when new computers were a bit dated in a few months, and then obsolete in a couple of years. Carmack/ID Software/Doom was a more typical example of this, as they did it repeatedly and regularly, frequently in collaboration with the hardware companies of the time. But there was near zero uncertainty. There was a clear path to the goal down to the point of exact expected specs.

With LLMs there's not not only no clear path to the goal, but there's every reason to think that such a path may not exist. In literally every domain neural networks have been utilized in you reach asymptotic level diminishing returns. Truly full self driving vehicles are just the latest example. They're just as far away now as they were years ago. If anything they now seem much further away because years ago many of us expected the exponential progress to continue, meaning that full self driving was just around the corner. We now have the wisdom to understand that, at the minimum, that's one rather elusive corner.


"In literally every domain neural networks have been utilized in you reach asymptotic level diminishing returns."

Is that true, though? I think of "grokking", where long training runs result in huge shifts in generalization, only with orders of magnitude more training after training error seemed to be asymptotically low.

This'd suggest both that there's not that asymptotic limit you refer to - something very important is happening much later - and that there are potentially some important paths to generalization on lower amounts of training data that we haven't yet figured out.


I think training error itself has diminishing returns as a metric for LLM usefulness.

A lower error, after a certain point, does not suggest better responses


Did he mean "AGI is easy to build" or "AGI is easy to sell"?

My impression was easy to build, as he had an engineering background. It seemed like he expected AGI was ready to emerge fully formed like Athena from the head of Zeus.

Could also have meant "easy" (no-brainer) to invest in, whether or not that's actually true.

Remember that these same VC firms had large crypto and blockchain divisions and were hosting lavish conferences dedicated to it. Regardless of what their job title is, if someone can't back up their talk with real world experience with the tech, there's no reason to take them seriously.

I am patiently waiting for the industry and VCs to pivot their "visions" to "vertical AI", then to "targeted ML applications", and "knowledge-assisted automation systems for the manufacturing and logistics industries". They really burned a lot of money and energy on stuff noone seems to want or need. That money is going to run out soon.

As the old saying says “The market can stay irrational longer than you can stay solvent”

For some reason, VCs got super hype about foundation model companies, but from my perspective being very deep in this area is that there's zero moat there with the prevalence of open source, so it's a very stupid area to burn massive amounts of cash in.

Targeted AI applications and virtual "executive assistant" agents are going to be huge though.


My take would be that AGI is inevitable, and it's your choice if you want to be a part of making it (or deadend branches along the way) or not. I take Ray Kurzweil's predicted timeline as being the most sensible.

I just don’t think Ray Kurzweil understands these systems well enough. His views aren’t of much substance anymore.

If you don't know Gary Marcus: He keeps repeating this same argument since many years. He has always downplayed the capabilities of LLMs since the beginning. He never really added much useful (constructive) content to the discussion or to the research, though (except that he is advocating symbolic AI methods). So he is not really taken seriously by the community, but rather annoying on this.

Although, of course he is not alone with his argument. E.g. even Yann LeCun is repeating a similar argument on LLMs, and many other serious researches as well, that we probably need a bit more than just LLMs + our current training method + scaling up. E.g. some model extension which handles long-term memory better (lots of work on this already), or other model architectures (e.g. Yann LeCun proposed JEPA), or online learning (unclear how this should work with LLMs), or also different training criteria, etc. In any case, multi-modality (not just text) is important. Maybe embodiment (robots, interaction with the real world) as well.


I lost all respect to Gary Marcus when he in 2018 claimed that autonomous driving is decades away.

Oh wait. I lost all respect to those hype-men who claimed it's just around the corner.


There are robotaxis being tested in the west coast US and Wuhan, China.

I’m not sure if I’m parsing your sentence right but surely you can’t be implying that autonomous driving isn’t possible or that it is some undermined time away?

You can come down to SF and take a waymo today. It works great.


No one doubted you could take precise area scans of a city and create motorised robots on fixed routes.

[delayed]

Not anyone important.

The context was SAE level 5 (Full Driving Automation)

Level 5 vehicles do not require human attention?the “dynamic driving task” is eliminated. Level 5 cars won’t even have steering wheels or acceleration/braking pedals. They will be free from geofencing, able to go anywhere and do anything that an experienced human driver can do.


Right, but is any serious researcher insisting that just scaling will be enough to achieve AGI? Because while I'm a layperson on this stuff, I don't get that impression, meaning this criticism from Marcus is mostly attacking a strawman. And he's not the only one peddling it.

The understanding I have is that scalability of LLMs with data took even their developers by surprise. They kept at it because empirically, so far, it's worked. But nobody assumes it'll keep working indefinitely or that it'll lead to AGI alone. If they did, OpenAI, Google, etc. would have fired most of their researchers and would simply focus everything they have on scaling.


There are plenty of leaders in SV, one example being Sam Altman that is preaching that hardware is key. Just take a look at nVidia's $2t market cap.

When we all know it isn't, and this can have the side effect of creating a bubble. I would be concerned with this as it can have terrible effects to AI research long-term.


When you look at the history of scaling up Transformers, e.g. GPT2, most people in the community were in fact quite surprised that just scaling up those models worked so well. And every time you read a lot of opinions that this is the end and scaling them up further will not give (much) improvements. This criticism is lower now but still there.

Scaling self-attention has also worked much further than it was initially believed it could work. Initially it was always believed to be the main bottleneck, but this turned out wrong. The feed-forward layers are much more the bottleneck when looking at where most of the compute is spent.

It's still a bit unclear how much further we can take the scaling. We are mostly at the limit of what is financial feasible today but as long as some form of generalized Moore's law continues (e.g. number of transistors per dollar), it becomes financial feasible to scale further. At some point we also hit a limit of available data. But maybe self-play (or variants) might solve this.

I guess most researchers agree that just scaling up further is maybe not optimal (w.r.t. reaching AGI), or also will not be enough, but it's a somewhat open question.


The word scaling has multiple definitions. It's not just the same hardware getting cheaper over time, there have been many architectural improvements:

  - Better training algorithms for training with lower precision
  - fastattention, fastattention 2 for decreasing bandwidth inside the GPU
  - ring attention for decreasing bandwidth across GPUs
  - algorithmic improvements in handling long context window
  - MoE
  - Lots of algorithmic improvements in fine tuning / alignment
  - GROQ hardware architecture (deterministic hardware, storing all data in SRAM for inferencing instead of using cache hierarchies)
  - Improvements in tokenization
So far softmax(K * Q')*V is the only thing that hasn't (yet) been touched.

If the question is that ,,just'' improving LLM perplexity by further algorithmic and hardware improvements will lead to AGI, at this point many researchers believe that the answer is yes (and many others that the answer is no :) ).


LeCun has startling bad takes for a major player in the space to the point that they can only be bad faith and driven by adverse motivations.

This may be part of the reason.

https://www.linkedin.com/posts/yann-lecun_what-meta-learned-...

One might also imagine that as one of the "godfathers of AI" he feels a bit sidelined by the success of LLMs (especially given above), and wants to project an image of visionary ahead of the pack.

I actually agree with him that if the goal is AGI and full animal intelligence then LLMs are not really the right path (although a very useful validation of the power of prediction). We really need much greater agency (even if only in a virtual world), online learning, innate drives, prediction applied to sensory inputs and motor outputs, etc.

Still, V-JEPA is nothing more than a pre-trained transformer applied to vision (predicting latent visual representations rather than text tokens), so it is just a validation of the power of transformers, rather than being any kind of architectural advance.


Your first paragraph is an ad-hominem that you should be ashamed of. "Not really taken seriously by the community"? Which community is that? Last time I saw Marcus he was a co-organizer of a panel on Dug Lenat and Knowledge Representation in AAAI 2024 [1]. Does that sound like the AI community is not taking him seriously? Is anyone asking you to orgranise panels in AAAI?

The question is rhetorical: I can see you're a PhD student. My advice is to learn to have some respect for the person of others, as you will want them to have respect of your person when, one day, you find yourself saying something that "the community" disagrees with. And if you're not planning to, one day, find yourself in that situation, consider the possibility that you're in the wrong job.

And what is the above article saying that you think should not be taken seriously? Is it not a widely recognised fact that neural nets performance only improves with more data, more compute and more parameters? A five year old could tell you that. Is it controversial that this is a limitation?

_____________________

[1] https://aaai.org/aaai-conference/aaai-24-panels/


I did not intend to attack Gary or so in any way. But I realize that my statement is probably too strong. Of course it's not the whole AI community. My intention with my post was just to give some perspective, some background for people who have not heard about Gary Marcus before.

Maybe I'm also in a bubble, but I was speaking mostly about the people I frequently read from, i.e. lots of people from Google Brain, DeepMind, other people who frequently publish on NeurIPS, ICLR, ICML, etc. Among those people, Gary is usually not taken seriously. At least that was my impression.

But let's not make this so much about Gary: Most of these people disagree with the opinion that Gary shares, i.e. they don't really see such a big need for symbolic AI, or they see much more potential in pure neural approaches (after all, the human brain is fully neural).


>> Of course it's not the whole AI community. My intention with my post was just to give some perspective, some background for people who have not heard about Gary Marcus before.

Yes, I get it. And the perspective you wanted to give was to not take Marcus seriously because the people you follow on social media say he's not to be taken seriously. That's nothing but a form of collective online bullying that attacks the person and not the opinion, and like I say in my other comment above, shameful.

Consider for a moment the impression that you make when you say that some people you know, when they're not publishing on NeurIPS, are on social media dogpiling on someone who criticises their work. That's not researchers any more, but common social media trolls.

>> But let's not make this so much about Gary: Most of these people disagree with the opinion that Gary shares, i.e. they don't really see such a big need for symbolic AI, or they see much more potential in pure neural approaches (after all, the human brain is fully neural).

To my experience, the majority of neural net researchers don't know anything concrete about symbolic AI, just what they have heard second-hand, usually on social media again, usually by people who disagree with Marcus, who's the most famous proponent of neuro-symbolic AI (NeSy). So whatever opinion they have on NeSy is not an informed opinion.

There's plenty of literature on NeSy which is a bona fide field of research with a conference etc. This year Leslie Valiant was the keynote speaker and Yan LeCun the honoured guest:

https://sites.google.com/view/nesy2023

You really don't have to listen to what Marcus says to form an opinion on NeSy. Btw, I am not with them and I think they're going the wrong way, but at least I know what they're doing. That is much less than can be said about most neural net researchers, who rarely know anything outside their own work besides whatever preprint is trending on X. That's to the detriment of nobody but themselves.


> Consider for a moment the impression that you make when you say that some people you know, when they're not publishing on NeurIPS, are on social media dogpiling on someone who criticises their work. That's not researchers any more, but common social media trolls.

Twitter for the past decade plus has really publicized and amplified these petty, close-minded academic cliques. It's pretty disgusting to watch for someone who was also a PhD student eyeing an academic career once.


Yes, this kind of behaviour is keeping people away from research, except for the ones who are fine with it which of course ends up encouraging the bad behaviour. I'm sorry to see the disappointment evident in your comment.

> Is it not a widely recognised fact that neural nets performance only improves with more data, more compute and more parameters?

I'm not sure how you meant that to be parsed.

1) performance only improves by scaling up those factors, and can't be improved in any other way

OR

2) performance can only (can't help but) get better as you scale up

I'm guessing you meant 1), which is wrong, but just in case you meant 2), that is wrong too. Increased scaling - in the absence of other changes - will continue to improve performance until it doesn't, and no-one knows what the limit is.

As far as 1), nobody thinks that scaling up is the only way to improve the performance of these systems. Architectural advances, such as the one that created them in the first place, is the other way to improve. There have already been many architectural changes since the original transformer of 2017, and I'm sure we'll see more in the models released later this year.

You ask if it's controversial that there is a limit to how much training data is available, or how much compute can be used to train them. For training data, the informed consensus appears to be that this will not be a limiting factor; in the words of Anthropic Dario Amodei "It'd be nice [from safety perspective] if it [data availability] was [a limit], but it won't be". Synthetic data is all the rage, and these companies can generate as much as they need. There's also masses of human-generated audio/video data that has hardly been touched.

Sure, compute would eventually become a limiting factor if scaling were the only way these models were being improved (which it isn't), but there is still plenty of headroom at the moment. As long as each generation of models make meaningful advances towards AGI, then I expect the money will be found. It'd be very surprising if the technology was advancing rapidly but development curtailed by lack of money - this is ultimately a national security issue and the government could choose to fund it if they had to.


> Last time I saw Marcus he was a co-organizer of a panel on Doug Lenat and Knowledge Representation in AAAI 2024

Next year he'll be teaming up with Rudy Giuliani to tout the success of SHRLDU at Four Seasons Landscaping.

The AI community asked GPT-4 to send him an invite, and he accepted.


Driverless cars are not just demos, they already work in a few cities (granted, they are still limited). And LLMs are massively useful already. What is this guy talking about?

While it's true that it is worth being skeptical about the current deep learning/LLM boom, younger people who don't know who Gary Marcus is need to know that he is not really an unbiased observer here -- he has never liked neural net approaches and thinks they were a wrong turn (even though they obviously are far more successful than the older symbolic "GOFAI" methods).

The paper that Gary cited is just so stupid. If you're drawing training data from a fixed distribution randomly, of course as the size of your training data set increases, the odds that you will get a random draw that is very similar to an existing piece of data increases, until you have to do a lot of random draws to see a new part of the distribution. That is the cause of the exponential increase in data requirements for linear increase in performance.

We just need to methodically sample the under sampled portions of the source distribution. Not more data, better data.


Ah yes, "just sample under sampled portions"

The problem is that they don't exist in discrete missed data sets they only exist in larger data sets that also contain large amounts of well sampled data.

If you could separate out the existing data you could cherry-pick the under sampled data, but do that requires the very classifier you're trying to build in the first place.

This reaction is like saying "if we knew everything we'd be able to answer every question"


His suggestion sounds more like human curated data sets rather than automated curation like you're assuming, although this is a good distinction to make.

That's not really true. You don't need to actually solve the problem, you just need to create a first order solution for optimizing your data set, which can enable better models that can be used to create a solution that is second order accurate, and so on iteratively.

Isn’t it possible for out of band systems to improve data distribution?

You don’t need a fancy model trained on all languages to tell you its training set underrepresented Finnish. You just need the training set curators to say “hey there’s no Finnish in the inputs”.

If the goal is high quality models, why be a purist about only using the resulting model to measure deficiencies in the training set?


Humans were able to get better data themselves. Which I think it is what we mean by actual AI.

At some point autoencoding during training will take care of that, it seems. We kind of already it with 'tutor' training and gpt-generated training data, as well as synthetic logic and math problem sets.

Oh, having language models sit on top of autoencoders is 100% the right way to go, even if we weren't moving towards multi-modal models, just because right now LLMs can be brilliant in one language and retarded in another based on the training data set. Putting an autoencoder in front and using "language agnostic" encoding would solve that problem.

>autoencoding during training

Could you elaborate on this?


And thank God for that.

If there's one thing humanity doesn't need, it's the kind of person who leads a Silicon Valley corporate entity creating a self-aware consciousness.


"Neural networks (at least in the configurations that have been dominant over the last three decades) have trouble generalizing beyond the multidimensional space that surrounds their training examples. That limits their ability to reason and plan reliably."

But humans also have this limitation. Every science discovery is just some new data we can train on and create models for, both mentally and mathematically.


Agreed. It sounds like the author’s bar for intelligence is to materialize new knowledge out of nothing. Which sounds more like magic than intelligence.

So models don't do well on data it wasn't trained on? Hasn't that always been true?

I'd stand by the bitter lesson ( http://www.incompleteideas.net/IncIdeas/BitterLesson.html ) here. of course it feels better to do something clever, finetune models, do clever agents or novel algorithms instead, ... only to become beaten by the next larger scaled general model.

Why be clever at all anymore?

Yup. With AI being the cleverest, the value comes from having vision and being able to do long term planning.

But can't AI do that too?

I think AI can easily have more strategy and patience and coordinate 10000 swarm members than you. Why will we need you to do anything at all?


You have to keep in mind that humans have a lot of hardwired advantages, and we take in a massive amount of multimodal data. Even when the best high level planning AIs are smarter than humans across the board, they'll still be vastly more expensive to run, so humans will continue to have value barring a complete revolution in computing efficiency.

I'm skeptical of both the people who claim we'll achieve AGI and self-driving cars in just a few years - as well as of the people who claim it can't be achieved by LLMs at all- ever.

I feel that we don't understand well enough (scientifically) what "general intelligence" even is, and how it comes to be in humans - to make claims either way.

To me, the only honest answer right now seems to be, that we do not know. We have absolutely no clue how close - or far - we are to AGI.


He's right, it won't get us to AGI.

But scaling is necessary if we ever want to get to AGI.

One thing his argument has a point is that we're really close to hitting a ceiling in improvements for LLMs, unless a new groundbreaking research arrives.

Just shoving more data, increasing parameters etc won't get us anywhere. Not to mention LLMs behave more like a data compression algorithm than anyform of "artificial" intelligence.

What I personally find troubling is what will happen to NVidia and other pick sellers once the LLM hype fades away and people are aware of its limitations, will we have still big investments in scaling? Likely not.


Stop posting Gary Marcus. Yeesh.

Scaling alone will certainly not get us to AGI, but that's trivially obvious.

The "scale is all you need" argument, from anyone intelligent, assumes that other obvious deficiencies such as lack of short-term memory (to enable more capable planning/reasoning) will also be taken care of along the way, and I'd not be surprised to see that particular one addressed in upcoming next-gen models.

There's a recent paper here from Google suggesting one way to do it, although may other ways too.

https://arxiv.org/pdf/2404.07143.pdf

The more interesting question is what other components/capabilities needs to be added to pre-trained transformers to get to AGI, or are some of the missing pieces so fundamental that they require a new approach, and can't just be retrofitted along the way as we continue to scale up?


He’s somehow a doomer but also at the same time saying it won’t happen

There *is* intelligent life on Earth, but I leave for Texas on Monday.

I thought the Numenta approach of trying to emulate a column of neo-cortex was more promising (and ambitious) than just scaling up LLM power. But here we are.

Legal | privacy