Why is Google still lagging behind OpenAI? Here's one theory:
1. The tasks you present here are technical challenges. OpenAI is clearly trying to solve some of these challenges by grafting on particular functions that go beyond "pure" LLM-based next-token prediction -- this is why ChatGPT will now make use of Python code to solve math problems, for example. Although OpenAI is technically a nonprofit that claims it is pursuing artificial general intelligence, in reality it is acting like a commercial tech company that's iterating on its product to make it more useful to consumers.
2. In contrast, Google is a for-profit Big Tech company, yet some of the sharpest thinkers on the Google team (many coming from DeepMind) don't seem all that interested in solving these sorts of technical challenges. Francois Chollet, for example, is quite open in his research and public writings about seeing the current functionality of LLMs as quite narrow, and limited to whatever data they've been trained upon, and with very little capacity to generalize to novel situations. He -- and perhaps others at Google? -- appear to be searching for bigger conceptual breakthroughs that could lead to true general intelligence.
I don't have any inside reporting on this but I strongly suspect that what's going on there is that Google has a team of people whose job is to make sure that Google products don't do anything racist, and that their efforts to diversify Gemini's drawings went way overboard. In some situations it's almost impossible to get Gemini to draw any white people. I'll bet it didn't help that Google felt it was behind OpenAI and so there was probably pressure to ship quickly rather than do more testing. So to get the model out the door quickly they took some crude measures that wound up over-correcting and making them look ridiculous.
I cannot even begin to express just how Superior I find Gemini for almost all activities involving writing. I'll continue to subscribe to chat gpt4 but I pretty much haven't touched it in the past 9 days. Chat gpt4 continues to write everything in it's default setting, that being an overly word yessay, that sounds like a pretentious college student trying to show up how intelligent it is. Gemini on the other hand sounds almost scary human on most of the writing tasks I have it do. Have no idea about logic tests and coding since I just never do any of that stuff. But for anything involving writing, especially in the area of sales or marketing or things like writing copy and emails, absolutely no contest, Gemini basically crushes gpt4 in those areas.
The fruit slice obsession is hilarious. Gives Gemini a sort of human quality. ("The guy just really loves his sliced fruit segments, what are you gonna do?")
Your observations above appear to align with the tentative consensus out there. My most recent (completely unscientific) poll had 85% saying that ChatGPT (GPT-4) was better than Gemini Ultra in their tests (8% said they're about the same, while 8% preferred Gemini).
But there are also anecdotal indications that Gemini, while trailing behind on reasoning tasks, is actually a better *creative* writer than ChatGPT (less bland, more varied, more imaginative).
I'm yet to test Gemini Ultra out myself as I'm holding off on pulling the trigger on the free trial until some of the early quirks are taken care of.
I think Gemini's ultimate claim to fame may come not from potentially nudging out GPT-4 on benchmarks but from the upcoming 1 million token window combined with Gemini's native multimodality.
The fact that Gemini can reliably understand lengthy video input, down to specific frames and individual objects, definitely appears like a leap from what we're currently used to. Take a look at this impressive example Ethan Mollick posted just a few hours ago (he has insider access that we don't):
I compared how Gemini Ultra and GPT-4 answered the same questions across a range of categories: https://theaidigest.org/gemini-vs-chatgpt
Readers voted for which answer they prefer, and it's pretty mixed - I think the performance is fairly similar for the two models.
Cool!
Why is Google still lagging behind OpenAI? Here's one theory:
1. The tasks you present here are technical challenges. OpenAI is clearly trying to solve some of these challenges by grafting on particular functions that go beyond "pure" LLM-based next-token prediction -- this is why ChatGPT will now make use of Python code to solve math problems, for example. Although OpenAI is technically a nonprofit that claims it is pursuing artificial general intelligence, in reality it is acting like a commercial tech company that's iterating on its product to make it more useful to consumers.
2. In contrast, Google is a for-profit Big Tech company, yet some of the sharpest thinkers on the Google team (many coming from DeepMind) don't seem all that interested in solving these sorts of technical challenges. Francois Chollet, for example, is quite open in his research and public writings about seeing the current functionality of LLMs as quite narrow, and limited to whatever data they've been trained upon, and with very little capacity to generalize to novel situations. He -- and perhaps others at Google? -- appear to be searching for bigger conceptual breakthroughs that could lead to true general intelligence.
Tons of comments from conservatives in my TL about Gemini's guide rails on generating images of people of different races. Amy thoughts on that?
I don't have any inside reporting on this but I strongly suspect that what's going on there is that Google has a team of people whose job is to make sure that Google products don't do anything racist, and that their efforts to diversify Gemini's drawings went way overboard. In some situations it's almost impossible to get Gemini to draw any white people. I'll bet it didn't help that Google felt it was behind OpenAI and so there was probably pressure to ship quickly rather than do more testing. So to get the model out the door quickly they took some crude measures that wound up over-correcting and making them look ridiculous.
Feels like Google is running into conflicts between "organizing the world's information" (as it is) and portraying the world as it could be.
Is there a price difference between the systems under test? The "latest version of GPT-4" is subscription-only - while Bard is free to use.
I was testing Gemini advanced which is a premium product that costs around $20/month like chatgpt 4.
Fascinating!
Excellent analysis and testing setup!
I cannot even begin to express just how Superior I find Gemini for almost all activities involving writing. I'll continue to subscribe to chat gpt4 but I pretty much haven't touched it in the past 9 days. Chat gpt4 continues to write everything in it's default setting, that being an overly word yessay, that sounds like a pretentious college student trying to show up how intelligent it is. Gemini on the other hand sounds almost scary human on most of the writing tasks I have it do. Have no idea about logic tests and coding since I just never do any of that stuff. But for anything involving writing, especially in the area of sales or marketing or things like writing copy and emails, absolutely no contest, Gemini basically crushes gpt4 in those areas.
Interesting. So the difference is mainly Gemini's ability to vary the tone and style better than ChatGPT?
Ouch!
The fruit slice obsession is hilarious. Gives Gemini a sort of human quality. ("The guy just really loves his sliced fruit segments, what are you gonna do?")
Your observations above appear to align with the tentative consensus out there. My most recent (completely unscientific) poll had 85% saying that ChatGPT (GPT-4) was better than Gemini Ultra in their tests (8% said they're about the same, while 8% preferred Gemini).
But there are also anecdotal indications that Gemini, while trailing behind on reasoning tasks, is actually a better *creative* writer than ChatGPT (less bland, more varied, more imaginative).
I'm yet to test Gemini Ultra out myself as I'm holding off on pulling the trigger on the free trial until some of the early quirks are taken care of.
I think Gemini's ultimate claim to fame may come not from potentially nudging out GPT-4 on benchmarks but from the upcoming 1 million token window combined with Gemini's native multimodality.
The fact that Gemini can reliably understand lengthy video input, down to specific frames and individual objects, definitely appears like a leap from what we're currently used to. Take a look at this impressive example Ethan Mollick posted just a few hours ago (he has insider access that we don't):
https://www.linkedin.com/feed/update/urn:li:activity:7166242775103971328/