Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

> Then what is 'real' learning?

It’s understanding. GPT succeeds at simple math questions because it has seen them over and over in texts, but give it something it has never seen and it will fumble. It doesn’t understand and does not replicate the method used to arrive at the correct solution and it will fake an answer instead of telling you it doesn’t know, which is even worse.



sort by: page size:

>because GPT-3 can do simple math

It can't actually, and again this is an example of the same issue. This was discussed earlier here[1]. Sometimes it produces correct arithmetic results on addition or subtraction of very small numbers, but again this is likely simply an artifact of training data. On virtually everything else it's accuracy drops to guesswork, and it doesn't even consistently get operations right that are more or less equivalent to what it just did before.

If it actually did understand mathematics, it would not be good at adding two or three digit numbers but fail at adding four digit numbers or doing some marginally more complicated looking operation. That is because that sort of mathematics isn't probabilistic. If it had learned actual mathematical principles, it would do it without these errors.

Mathematics doesn't consider of guessing the next language token in a mathematical equation from data, it consists of understanding the axioms of maths and then performing operations according to logical rules.

This problem is akin to the performance of ML in games like breakout. It looks great, but then you adjust the paddle by five pixels and it turns out it hasn't actually understood what the paddle or the point of the game is at all.

[1]https://news.ycombinator.com/item?id=23896326


> so as you clarify, its responses get much closer to the target.

This isn't a given. Plenty of times this isn't true. You cannot convince GPT to answer every problem all the time.

For example, try and teach it a grammar. No matter how many times you try and work with it, you won't be able to.

You can teach almost anyone a grammar if they are inclined to try to fuss through it. Not GPT.

And yes I have used GPT-4 a lot, please don't assume I haven't.


>> A child that is learning addition will have a harder time with larger numbers and make more mistakes too.

GPT-3 is not a child. GPT-3 is a language model and language models are systemst that predict the next token that follows from a sequence of tokens. A system like that can give correct answers to arithmetic problems it has seen already and often, without having to learn what a child would learn when learning arithmetic.

A system like that can also give incorrect answers to arithmetic problems it has not seen already, or hasn't seen often enough and that will be for reasons very different than the reasons that a child will give incorrect answers to the same problem.

In general, we don't have to know anything about how children learn arithmetic to know how GTP-3 answers arithmetic problems, it suffices to know how language models work.


> It's not clear to me if the lesson here is that GPT's reasoning capabilities are being masked by an incorrect prior (having memorized the standard version of this puzzle) or if the lesson is that GPT'S reasoning capabilities are always a bit of smoke and mirrors that passes off memorization for logic.

It's a lot closer to the latter. GPT doesn't have "reasoning capabilities", any more than any other computer program. It doesn't have a clue what any of its input means, nor the meaning of the text it outputs. It just blindly spits out the words most probable to follow the prompt, based on its corpus of training data and the weights/biases added to fine tune it. It can often do a good job at mimicking reasoning, but it's not.


> it's pretty clear that GPT is producing an amazing level of comprehension of what a series of words means. The problem is, that's all it is really doing.

very key point


> I'm not sure about its math, but GPT-4 fails miserably at simple arithmetic questions like 897*394=?

That's, um, about 300,000?

...

353,418 actually. But I'm not going to blame the AI too much for failing at something I can't do either.


> I can literally see GPT take inputs, make conclusions based on them, and ask me questions to test its hypotheses, right before my eyes in real time.

It takes inputs and produces new outputs (in the textual form of questions, in this case). That's all. It's not 'making conclusions', it's not making up hypotheses in order to 'test them'. It's not reasoning. It doesn't have a 'model of the world'. This is all a projection on your part against a machine that inputs and outputs text and whose surprising 'ability' in this context is that the text it generates plays so well on the ability of humans to self-fool themselves that its outputs are the product of 'reasoning'.


>If it can do 10 code questions it has seen before but fails to do 10 it hasn't (of similar difficulty) then it strongly suggests that it isn't reasoning its way through the questions, but regurgitating/rephrasing.

First of all, coding is one thing where expecting perfect try on first pass makes no sense. That GPT-4 didn't one-shot those problems doesn't mean it can't solve them.

Moreover, all this says if true is that GPT-4 isn't as good at coding as initially thought. Nothing else. Doesn't mean it doesn't reason. There are many other tasks where GPT-4 performs about as well on out of distribution/unseen data


>Problem is the current systems can’t reason about things, math included.

Have you tried asking GPT-4 any questions that require reasoning to solve? If so, what did you ask, and what did it get wrong?


> Even after I pointed this mistake out, it repeated exactly the same proposed plan. It's not clear to me if the lesson here is that GPT's reasoning capabilities are being masked by an incorrect prior (having memorized the standard version of this puzzle) or if the lesson is that GPT'S reasoning capabilities are always a bit of smoke and mirrors that passes off memorization for logic.

It has no reasoning capabilities. It has token prediction capabilities that often mimic reasoning capabilities.


> Would it be possible to eliminate ~90% of the current errors?

They aren't errors. When the machine generates the text: "Next you'll tell me 2 + 2 = 5" there isn't an error in math, because no math was attempted. It's a success in predictable wording structure.

In fact, that example alone is a super success since we english speaking humans can get a broader context from that statement: You're going to tell me something nonsensical.

I don't get how you say the first part of your comment and then follow up with completely misunderstanding in the second half. Is GPT that enthralling to our brains?


>I see many people claim GPT-n is "dumb"

Depends.

Can't do math or logic. I have a question I ask ChatGPT to see if it can do logic yet, it still cannot. (Can't mention this question here or it will get fixed.)

Its great for brain storming or low risk problems. I don't think the accuracy problem will ever be fixed.

I probably 5x my productivity as well, but that doesnt mean its able to do logic.


> GPT-3 is already composing halfway-credible poetry and songs.

Poetry and songs abound in its training data.

> I just asked it to write some code to walk a k-d tree, and it did that, too, complete with a unit test.

Such things abound in its training data.

GPT-3 can't produce anything novel. It's a search engine over natural generalisations of its training data, but it can't think for itself. It couldn't solve any of the unsolved problems of our day. Give it a maths problem with a known solution, and it might give you a half-way decent proof. Give it a problem without a known solution – even an easier one – and it'll flounder.


>>but it can't 'teach' them to us

I always thought you could ask GPT to illustrate the steps it took to arrive at the answer. I mean it can take your through the process it went through to arrive at the answer. Its as close you get to an explanation.


>Except that the premise of ChatGPT is that it "knows" things, and that you can ask it about these things in specific ways and it will answer.

The premise of ChatGPT has never been that it knows everything.

>So, it's not about whether it's been tuned to be a grandmaster. It's about whether it can even execute on the knowledge it clearly possesses.

Factual knowledge on paper has never been a guarantee you can actually perform the task. In fact, it usually isn't.

>And certainly, there are some humans who fit this "appearance" description as well, perhaps through some processing deficit or otherwise. But as a broad category, this is something humans do quite well.

I'm telling you this is a normal occurrence and not something that only happens with a "processing deficit".

You can rant about all the details or rules of a tennis game all you like and fail to play it, you can know the physics theory but still struggle to solve problems. Hell we don't need to leave chess, give our stranger a proper book with rules and the strategies employed by grandmasters. For a while after, He's still going to make illegal moves and he will be nowhere near the level of a grandmaster.

>Again you are still not bringing up anything that serves as the distinguisher you want it to serve. The concept of understanding is really not complicated. When we explain something to someone and we ask the question "do you understand?", we're not asking whether they can memorize or repeat back what we just told them. We're asking something else.

Ok and ? GPT passes many tests that demonstrate understanding.


> GPT is bad at math because BPE input compression obfuscates individual digits. https://bbot.org/etc/gpt-math.png You'd be bad at math too if every number was scrambled.

This is the myth I was referring to. BPE compression may slow down training, but it doesn't follow that slower training is the reason for being bad at math.

If you trained GPT specifically on arithmetic tasks, you'd get superior performance to GPT-3, regardless of which tokenization scheme you'd use. But you'd destroy most of its knowledge about everything not-arithmetic.


"The reason for that is simple - GPT lacks logical reasoning capabilities. It can 'fake' them on the surface level, it might even fake them really well for extremely common problems, but as soon as you prod deeper or tack on an extra requirement or two, it starts spinning in circles indefinitely."

Those unqualified statements are false. GPT4 may not have been able to pass your particularly complex reasoning task, but that does not mean that it can't reason.

My tone is really out of extreme frustration because misjudgement like you are displaying literally puts the fate of the human race at risk.


>but they can only reason using their memories and the prompt.

Eh no.

https://arxiv.org/abs/2212.10559

>But if you try very hard you can find "held out" data and when you test on it, GPT4 stops looking so smart:

This can be done to anybody. This can be done to you. It's not a gotcha. Nobody is saying GPTs don't/can't memorize.


>> But GPT-3 is much more successful, including at giving correct answers to arithmetic problems that weren't in its training set.

That's not exactly what the GPT-3 paper [1] claims. The paper claims that a search of the training dataset for instances of, very specifically, three-digit addition, returned no matches. That doesn't mean there weren't any instances, it only means the search didn't find any. It also doesn't say anything about the existence of instances of other arithmetic operations in GPT-3's training set (and the absence of "spot checks" for such instances of other operations suggests they were, actually, found- but not reported, in time-honoured fashion of not reporting negative results). So at best we can conclude that GPT-3 gave correct answers to three-digit addition problems that weren't in its training set and then again, only the 2000 or so problems that were specifically searched for.

In general, the paper tested GPT-3's arithmetic abilities with addition and subtraction between one to five digit numbers and multiplication between two-digit numbers. They also tested a composite task of one-digit expressions, e.g. "6+(4*8)" etc. No division was attempted at all (or no results were reported).

Of the attempted tasks, all than addition and subtraction between one to three digit numbers had accuracy below 20%.

In other words, the only tasks that were at all successful were exactly those tasks that were the most likely to be found in a corpus of text, rather than a corpus of arithmetic expressions. The results indicate that GPT-3 cannot "perform arithmetic" despite the paper's claims to the contrary. They are precisely the results one should expect to see if GPT-3 was simply memorising examples of arithmetic in its training corpus.

>> So what changed? We aren't sure, but the speculation is that in the process of training, GPT-3 found that the best strategy to correctly predicting the continuation of arithmetic expressions was to figure out the rules of basic arithmetic and encode them in some portion of its neural network, then apply them whenever the prompt suggested to do so.

There is no reason why a language model should be able to "figure out the rules of basic arithmetic" so this "speculation" is tantamount to invoking magick.

Additionally, language models and neural networks in general are not capable of representing the rules of arithmetic because they are incapable of representing recursion and universally quantified variables, both of which are necessary to express the rules of arithmetic.

In any case, if GPT-3 had "figure(d) out the rules of basic arithmetic", why stop at addition, subtraction and multiplication between one to five digit numbers? Why was it not able to use those learned rules to perform the same operations with more digits? Why was it not capable of performing division (i.e. the opposite of multiplication)? A very simple asnwer is: GPT-3 did not learn the rules of arithmetic.

_________

[1] https://arxiv.org/abs/2005.14165

next

Legal | privacy