Google's marketing materials said it's slightly better than GPT-4 across benchmarks. I'll be checking leaderboards on Huggingface over the next few days for independent confirmation.
So it's basically just GPT-4, according to the benchmarks, with a slight edge for multimodal tasks (ie audio, video).
Google does seem to be quite far behind, GPT-4 launched almost a year ago.
Google's own benchmarking shows that Gemini Pro is just slightly better than GPT 3.5 and Gemini Ultra is comparable to GPT 4 (see their technical paper).
Ultra benchmarked around the original release of GPT-4, not the current model. My understanding is that was fairly accurate — it's close to current GPT-4 but not quite equal. However, close-to-GPT-4 but 4x cheaper and 10x context length would be very impressive and IMO useful.
They’re extrapolating from the performance of GPT-3.5. It’s speculative, but not anecdotal. GPT has improved rapidly over time, so it's not a huge leap to predict that GPT-4 will be even better.
The whitepaper has a few benchmarks vs. GPT-4. Most are reported benchmarks, though. Most of the blogs/news articles I've seen mention Google's push to focus on GPT-3.5. Found the whitepaper table way better at summarizing this. https://storage.googleapis.com/deepmind-media/gemini/gemini_...
The performance results here are interesting. G-Ultra seems to meet or exceed GPT4V on all text benchmark tasks with the exception of Hellaswag where there's a significant lag, 87.8% vs 95.3%, respectively.
No race has begun. GPT 4 is so far ahead in everything. Even in their official metrics[1], and that reports official metrics for first version of GPT 4 from paper. People have ran the benchmarks again and found much better results like 85% HumanEval. It's like no one even thinks about comparing to GPT 4 and it is just reported as gold standard.
reply