Google Gemini: all the details on the AI model Google hopes can ...

6 Dec 2023

The Verge

It’s the beginning of a new era of AI at Google, says CEO Sundar Pichai: the Gemini era. Gemini is Google’s latest large language model, which Pichai first teased at the I/O developer conference in June and is now launching to the public. To hear Pichai and Google DeepMind CEO Demis Hassabis describe it, it’s a huge leap forward in an AI model that will ultimately affect practically all of Google’s products. “One of the powerful things about this moment,” Pichai says, “is you can work on one underlying technology and make it better and it immediately flows across our products.”

Gemini is more than a single AI model. There’s a lighter version called Gemini Nano that is meant to be run natively and offline on Android devices. There’s a beefier version called Gemini Pro that will soon power lots of Google AI services and is the backbone of Bard starting today. And there’s an even more capable model called Gemini Ultra that is the most powerful LLM Google has yet created and seems to be mostly designed for data centers and enterprise applications.

Google is launching the model in a few ways right now: Bard is now powered by Gemini Pro, and Pixel 8 Pro users will get a few new features thanks to Gemini Nano. (Gemini Ultra is coming next year.) Developers and enterprise customers will be able to access Gemini Pro through Google Generative AI Studio or Vertex AI in Google Cloud starting on December 13th. Gemini is only available in English for now, with other languages evidently coming soon. But Pichai says the model will eventually be integrated into Google’s search engine, its ad products, the Chrome browser, and more, all over the world. It is the future of Google, and it’s here not a moment too soon.

At first, Gemini comes in three sizes, meant for three different purposes.

Image: Google

OpenAI launched ChatGPT a year and a week ago, and the company and product immediately became the biggest things in AI. Now, Google — the company that created much of the foundational technology behind the current AI boom, that has called itself an “AI-first” organization for nearly a decade, and that was clearly and embarrassingly caught off guard by how good ChatGPT was and how fast OpenAI’s tech has taken over the industry — is finally ready to fight back.

So, let’s just get to the important question, shall we? OpenAI’s GPT-4 versus Google’s Gemini: ready, go. This has very clearly been on Google’s mind for a while. “We’ve done a very thorough analysis of the systems side by side, and the benchmarking,” Hassabis says. Google ran 32 well-established benchmarks comparing the two models, from broad overall tests like the Multi-task Language Understanding benchmark to one that compares two models’ ability to generate Python code. “I think we’re substantially ahead on 30 out of 32” of those benchmarks, Hassabis says, with a bit of a smile on his face. “Some of them are very narrow. Some of them are larger.”

Google says Gemini beats GPT-4 in 30 out of 32 benchmarks

In those benchmarks (which really are mostly very close) Gemini’s clearest advantage comes from its ability to understand and interact with video and audio. This is very much by design: multimodality has been part of the Gemini plan from the beginning. Google hasn’t trained separate models for images and voice, the way OpenAI created DALL-E and Whisper; it built one multisensory model from the beginning. “We’ve always been interested in very, very general systems,” Hassabis says. He’s especially interested in how to mix all of those modes — to collect as much data as possible from any number of inputs and senses and then give responses with just as much variety.

Right now, Gemini’s most basic models are text in and text out, but more powerful models like Gemini Ultra can work with images, video, and audio. And “it’s going to get even more general than that,” Hassabis says. “There’s still things like action, and touch — more like robotics-type things.” Over time, he says, Gemini will get more senses, become more aware, and become more accurate and grounded in the process. “These models just sort of understand better about the world around them.” These models still hallucinate, of course, and they still have biases and other problems. But the more they know, Hassabis says, the better they’ll get.

“These models just sort of understand better about the world around them.”

Benchmarks are just benchmarks, though, and ultimately, the true test of Gemini’s capability will come from everyday users who want to use it to brainstorm ideas, look up information, write code, and much more. Google seems to see coding in particular as a killer app for Gemini; it uses a new code-generating system called AlphaCode 2 that it says performs better than 85 percent of coding competition participants, up from 50 percent for the original AlphaCode. But Pichai says that users will notice an improvement in just about everything the model touches.

Equally important to Google is that Gemini is apparently a far more efficient model. It was trained on Google’s own Tensor Processing Units and is both faster and cheaper to run than Google’s previous models like PaLM. Alongside the new model, Google is also launching a new version of its TPU system, the TPU v5p, a computing system designed for use in data centers for training and running large-scale models.

Big-deal AI model; kind of boring logo.

Image: Google

Talking to Pichai and Hassabis, it’s clear that they see the Gemini launch both as the beginning of a larger project and as a step change in itself. Gemini is the model Google has been waiting for, the one it has been building toward for years, maybe even the one it should have had ready before OpenAI and ChatGPT took over the world.

Google, which declared a “code red” after ChatGPT’s launch and has been perceived to be playing catch-up ever since, seems to be still trying to hold fast to its “bold and responsible” mantra. Hassabis and Pichai both say they’re not willing to move too fast just to keep up, especially as we get closer to the ultimate AI dream: artificial general intelligence, the term for an AI that is self-improving, smarter than humans, and poised to change the world. “As we approach AGI, things are going to be different,” Hassabis says. “It’s kind of an active technology, so I think we have to approach that cautiously. Cautiously, but optimistically.”

Google says it has worked hard to ensure Gemini’s safety and responsibility, both through internal and external testing and red-teaming. Pichai points out that ensuring data security and reliability is particularly important for enterprise-first products, which is where most generative AI makes its money. But Hassabis acknowledges that one of the risks of launching a state-of-the-art AI system is that it will have issues and attack vectors no one could have predicted. “That’s why you have to release things,” he says, “to see and learn.” Google is taking the Ultra release particularly slowly; Hassabis compares it to a controlled beta, with a “safer experimentation zone” for Google’s most capable and unrestrained model. Basically, if there’s a marriage-ruining alternate personality inside Gemini, Google is trying to find it before you do.

For years, Pichai and other Google executives have waxed poetic about the potential for AI. Pichai himself has said more than once that AI will be more transformative to humanity than fire or electricity. In this first generation, the Gemini model may not change the world. Best-case scenario, it might just help Google catch up to OpenAI in the race to build great generative AI. (Worst-case scenario, Bard stays boring and mediocre, and ChatGPT keeps winning.) But Pichai, Hassabis, and everyone else at Google seem to think this is the beginning of something truly huge. The web made Google a tech giant; Gemini could be even bigger.