Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

Here's a logic question I just made up that GPT-4 failed and Gemini Advanced got right.

https://i.imgur.com/3sNr3LW.png https://i.imgur.com/EIj0nZg.png



sort by: page size:

I can use GPT-4 right now. Until I can use Gemini, I wouldn't believe Google a thing.

They're working on Gemini which is supposed to blow gpt-4 out of the water. I believe it when I prompt it.

Asking these to GPT3.5 has been an utterly frustrating experience, lol. I guess gemini is at this level of intelligence right now, not GPT4... rigged demos notwithstanding;)

Apparently used in GPT-4 and coming in Gemini 1.5.

Fairly certain that is not what is going on here. GPT-4 seems genuinely better at reasoning and harder to trick from my testing.

I just tried with gpt-4 and it failed on the first attempt :(

https://pastebin.com/gSm0bbH9

It didn’t realize that it had won.


Curious, have you seen examples of someone convincing it of something clearly wrong? Think I’ve seen examples of that with gpt3 but not 4 that I can recall.

Noone has released a GPT-4 competitor after a year.

I'd say that's quite a lead.

Maybe this has changed now with new Gemini model, but still that means one year ahead of everyone.


Gemini Ultra is the model they claim will match GPT-4, not out yet!

I tried to repeat the "experiment", now GPT-4 has appeared. https://chat.openai.com/share/08ae3a28-4f30-4e8c-a3cb-ac9480...

Yes it is, and it was tested. GPT-4 can't solve any of the tests.

GPT4 can't read non-latin languages (OpenAI admits it) such as Persian, or Chinese. It will halucinate meaning. Gemini Ultra nails it.

Stating the obvious. Also, they should’ve gone with GPT-4. It’s less error prone (but obviously still makes such mistakes)

counter point 1: Fewer people are using gpt 4 than those using all other models. So it is subject to far less tests than the others.

counter point 2: It is not a given that gpt 4 should fail in the same way as the older model. It likely has its own unique failure modes yet to be discovered. (See above)

(boys and girls is patronizing in tone)


Yes and no. In the paper, they do compare apples to apples with GPT4 (they directly test GPT4's CoT@32 but state its 5-shot as "reported"). GPT4 wins 5-shot and Gemini wins CoT@32. It also came off to me like they were implying something is off about GPT4's MMLU.

The article links to an example of GPT-4 failing to solve it. It seems likely that it can only solve it sometimes and basically gets “lucky”.

GPT-4 examples elsewhere in the comments suggest otherwise.

Wouldn't that cost a fortune? If I feed the maximum into gpt-4 it will already cost $1.28 per interaction! Or is Gemini that much cheaper too?

If you think GPT4 is bad, try Gemini Ultra.

Yesterday I asked it a simple question about Playboy and it told me it couldn't talk about Playboy as it would be harmful to women's rights.

next

Legal | privacy