First of all, the dataset used for evaluation was created by those researchers, weighing it in their favor.
Second, GPT-4 still performs better in 6 of those. Hardly 1 or 2. And when it doesn't, it's usually very close.
All of this is to say that GPT-4 will smoke any bespoke NLP model/API which is the main point.
reply
First of all, the dataset used for evaluation was created by those researchers, weighing it in their favor.
Second, GPT-4 still performs better in 6 of those. Hardly 1 or 2. And when it doesn't, it's usually very close.
All of this is to say that GPT-4 will smoke any bespoke NLP model/API which is the main point.
reply