Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

The direct prompt comparison isn't quite fair due to the instruction tuning on GPT-3.5 and 4. It'd be interesting to see examples with prompts that would work better for the raw language models.


view as:

Yeah it's hard to compare across models, interested in suggestions here.

We give all models a bunch of few-shot examples, which improves GPT-3 (davinci)'s question answering substantially. GPT-2 sometimes generates something that answers the question, sometimes it's just confused. Click "See full prompt" to see the few-shot examples that the models get.

Our goal was to exercise the full capabilities of each model.


I also found the riddle rather odd. I cannot say that 2 is actually the correct answer.

A problem with riddles is that they often have a hidden or secret context. I think especially in our digital age this one is closer to Frodo's "What have I got in my pocket?" "riddle". Here's some other possible solutions. 11+2 = 1. 1 + 1 + 2 = 4, mod 3 and we get 1, so 9 + 5 = 13, mod 3 and we get 1. We could also replace the addition sign with equality and similarly propose a digit summation so 1+1 == 2? True (1). 9 == 5? False (0). There's a hundred solutions to this riddle when it has no context. In fact, I stumbled into the right answer thinking about mod 12 without ever considering a clock until I saw the answer. Maybe I'm just dumb though, I am known to over think.


Legal | privacy