Hacker Read

alphabetting · 2024-02-08 13:02:12

Here's a logic question I just made up that GPT-4 failed and Gemini Advanced got right.

https://i.imgur.com/3sNr3LW.png https://i.imgur.com/EIj0nZg.png

reply

mrbungie | karma 516 | avg karma 2.43 · | 2023-12-06 21:53:55

I can use GPT-4 right now. Until I can use Gemini, I wouldn't believe Google a thing.

behnamoh | karma 20551 | avg karma 4.64 · | 2023-10-18 21:29:01

They're working on Gemini which is supposed to blow gpt-4 out of the water. I believe it when I prompt it.

isoprophlex | karma 9087 | avg karma 4.81 · | 2023-12-13 10:04:48

Asking these to GPT3.5 has been an utterly frustrating experience, lol. I guess gemini is at this level of intelligence right now, not GPT4... rigged demos notwithstanding;)

jasfi | karma 911 | avg karma 0.88 · | 2024-02-21 17:21:52

Apparently used in GPT-4 and coming in Gemini 1.5.

fastball | karma 11107 | avg karma 1.97 · | 2023-03-25 02:08:56

Fairly certain that is not what is going on here. GPT-4 seems genuinely better at reasoning and harder to trick from my testing.

dwaltrip | karma 5864 | avg karma 1.93 · | 2023-03-19 02:33:53

I just tried with gpt-4 and it failed on the first attempt :(

https://pastebin.com/gSm0bbH9

It didn’t realize that it had won.

reply

digbybk | karma 401 | avg karma 4.61 · | 2023-04-08 09:39:38

Curious, have you seen examples of someone convincing it of something clearly wrong? Think I’ve seen examples of that with gpt3 but not 4 that I can recall.

tomp | karma 21535 | avg karma 2.82 · | 2024-02-20 08:16:19

Noone has released a GPT-4 competitor after a year.

I'd say that's quite a lead.

Maybe this has changed now with new Gemini model, but still that means one year ahead of everyone.

reply

_micah_h | karma 46 | avg karma 5.75 · | 2024-01-17 20:15:41

Gemini Ultra is the model they claim will match GPT-4, not out yet!

byschii | karma 6 | avg karma 0.5 · | 2023-10-02 05:29:16

I tried to repeat the "experiment", now GPT-4 has appeared. https://chat.openai.com/share/08ae3a28-4f30-4e8c-a3cb-ac9480...

sanxiyn | karma 14687 | avg karma 3.61 · | 2023-04-04 01:03:40

Yes it is, and it was tested. GPT-4 can't solve any of the tests.

Departed7405 | karma 7 | avg karma 0.41 · | 2024-02-28 05:28:24

GPT4 can't read non-latin languages (OpenAI admits it) such as Persian, or Chinese. It will halucinate meaning. Gemini Ultra nails it.

acheong08 | karma 1302 | avg karma 2.14 · | 2024-01-01 12:02:16

Stating the obvious. Also, they should’ve gone with GPT-4. It’s less error prone (but obviously still makes such mistakes)

36083155 | karma 2 | avg karma 2.0 · | 2023-05-28 06:12:47

counter point 1: Fewer people are using gpt 4 than those using all other models. So it is subject to far less tests than the others.

counter point 2: It is not a given that gpt 4 should fail in the same way as the older model. It likely has its own unique failure modes yet to be discovered. (See above)

(boys and girls is patronizing in tone)

reply

bitshiftfaced | karma 1200 | avg karma 1.8 · | 2023-12-06 19:39:09

Yes and no. In the paper, they do compare apples to apples with GPT4 (they directly test GPT4's CoT@32 but state its 5-shot as "reported"). GPT4 wins 5-shot and Gemini wins CoT@32. It also came off to me like they were implying something is off about GPT4's MMLU.

MattRix | karma 2249 | avg karma 2.74 · | 2023-12-08 09:43:44

The article links to an example of GPT-4 failing to solve it. It seems likely that it can only solve it sometimes and basically gets “lucky”.

8organicbits | karma 4795 | avg karma 3.0 · | 2023-12-25 16:13:48

GPT-4 examples elsewhere in the comments suggest otherwise.

wkat4242 | karma 10400 | avg karma 2.0 · | 2024-02-23 16:08:42

Wouldn't that cost a fortune? If I feed the maximum into gpt-4 it will already cost $1.28 per interaction! Or is Gemini that much cheaper too?

qingcharles | karma 6485 | avg karma 2.82 · | 2024-02-09 17:10:58

If you think GPT4 is bad, try Gemini Ultra.

Yesterday I asked it a simple question about Playboy and it told me it couldn't talk about Playboy as it would be harmful to women's rights.

reply