Google Gemini and the future of large language models
The mediocre performance of Google's new model makes me wonder if classic LLMs are running out of steam.
It’s hard to write about Gemini, Google’s new family of large language models, because the company has kept the most interesting details under wraps. Google published a 62-page white paper about Gemini, but the “Model Architecture” section is a scant four paragraphs long. We learn that “Gemini models build on top of Transformer decoders”—the same architecture used by other large language models—but not much more than that.
When Gemini was launched on December 6, Google invited users to start experimenting with it via the Bard chatbot. But Bard is using the midrange Gemini Pro model that’s roughly comparable to GPT-3.5. Google’s most powerful model—Gemini Ultra—isn’t due out until the new year.
Multimodality—the ability to deal with images and audio as well as text—is a big selling point for Gemini. “Models are multimodal from the beginning and can natively output images using discrete image tokens,” the Gemini white paper states.
I was eager to take these new capabilities for a spin, but then I saw this in the Bard release notes: “you can try out Bard with Gemini Pro for text-based prompts, with support for other modalities coming soon.”
All of which means that the jury is still very much out on Gemini. Gemini was supposed to be Google’s chance to leap ahead of OpenAI in LLM technology. Maybe Gemini Ultra will ultimately live up to the hype.
But the version of Gemini that’s out now clearly isn’t a GPT-4 killer. And Google’s own benchmarks show the forthcoming Gemini Ultra achieving only an incremental improvement over GPT-4
That’s somewhat surprising because if any company can give OpenAI a run from their money, it should be Google. The big question for me is whether Google fumbled the ball in some way, or if the classic LLM architecture is starting to run out of steam.
My guess—and at this point it’s only a guess—is that we’re starting to see diminishing returns to scaling up conventional LLMs. Further progress may require significant changes to enable them to better handle long, complex inputs and to reason more effectively about abstract concepts.
Keep reading with a 7-day free trial
Subscribe to Understanding AI to keep reading this post and get 7 days of free access to the full post archives.