Google Reveals Gemini 2, AI Agents, and a Prototype Personal ...
Google once only wanted to organize the world’s information. Now it seems more intent on shoveling that information into artificial intelligence algorithms that become dutiful, ever-present, and increasingly powerful virtual helpers.
Google today announced Gemini 2, a new version of its flagship AI model that has been trained to plan and execute tasks on a user’s computers and the web, and which can chat like a person and make sense of the physical world as a virtual butler.
“I've dreamed about a universal digital assistant for a long, long time as a stepping stone on the path to artificial general intelligence,” Demis Hassabis, the CEO of Google DeepMind, told WIRED ahead of today’s announcement, alluding to the idea of AI that can eventually do anything a human brain can.
Gemini 2 is primarily another step up in AI’s intelligence as measured by benchmarks used to gauge such things. The model also has improved “multimodal” abilities, meaning it is more skilled at parsing video and audio and at conversing in speech. The model has also been trained to plan and execute actions on computers.
“Over the last year, we have been investing in developing more agentic models,” Google’s CEO, Sundar Pichai, said in a statement today. These models, Pichai added, “can understand more about the world around you, think multiple steps ahead, and take action on your behalf, with your supervision.”
Tech companies believe that so-called AI agents could be the next big leap forward for the technology, with chatbots increasingly taking on chores for users. If successful, AI agents could revolutionize personal computing by routinely booking flights, arranging meetings, and analyzing and organizing documents. But getting the technology to follow open-ended commands reliably remains a challenge, with the risk that errors could translate into costly and hard-to-undo mistakes.
Still, Google thinks it is moving in the right direction and is introducing two specialized AI agents to demonstrate Gemini 2 agentic potential: one for coding and another for data science. Rather than simply autocompleting sections of code, as current AI tools do, these agents can take on more complex work, such as checking code into repositories or combining data to enable analysis.
The company is also showing off Project Mariner, an experimental Chrome extension that is capable of taking over web navigation to do useful chores for users. WIRED was given a live demo at Google DeepMind’s headquarters in London. The agent was asked to help plan a meal, which saw it navigate to the website of the supermarket chain Sainsbury’s, log in to a user’s account, and then add relevant items to their shopping basket. When certain items were unavailable the model chose suitable replacements based on its own knowledge about cooking. Google declined to perform other tasks, suggesting it remains a work in progress.
“Mariner is our exploration, very much a research prototype at the moment, of how one reimagines the user interface with AI,” Hassabis says.
Google launched Gemini in December 2023 as part of an effort to catch up with OpenAI, the startup behind the wildly popular chatbot ChatGPT. Despite having invested heavily in AI and contributing key research breakthroughs, Google saw OpenAI lauded as the new leader in AI and its chatbot even touted as perhaps a better way to search the web. With its Gemini models, Google now offers a chatbot as capable as ChatGPT. It has also added generative AI to search and other products.
When Hassabis first revealed Gemini in December 2023, he told WIRED that the way it had been trained to understand audio and video would eventually prove transformative.
Google today also offered a glimpse of how this might transpire with a new version of an experimental project called Astra. This allows Gemini 2 to make sense of its surroundings, as viewed through a smartphone camera or another device, and converse naturally in a humanlike voice about what it sees.
WIRED tested Gemini 2 at Google DeepMind’s offices and found it to be an impressive new kind of personal assistant. In a room decorated to look like a bar, Gemini 2 quickly assessed several wine bottles in view, providing geographical information, details of taste characteristics, and pricing sourced from the web.
“One of the things I want Astra to do is be the ultimate recommendation system,” Hassabis says. “It could be very exciting. There might be connections between books you like to read and food you like to eat. There probably are and we just haven’t discovered them.”
Through Astra, Gemini 2 can not only search the web for information relevant to a user’s surroundings and use Google Lens and Maps. It can also remember what it has seen and heard—although Google says users would be able to delete data—providing an ability to learn a user’s taste and interests.
In a mocked-up gallery, Gemini 2 offered a wealth of historical information about paintings on the walls. The model rapidly read from several books as WIRED flicked through pages, instantly translating poetry from Spanish to English and describing recurrent themes.
“There are obvious business model opportunities for advertising or recommendations,” Hassabis says when asked if companies might be able to pay to have their products highlighted by Astra.
Though the demos were carefully curated, and Gemini 2 will inevitably make errors in real use, the model resisted efforts to trip it up reasonably well. It adapted to interruptions and as WIRED suddenly changed the phone’s view, improvising much as a person might.
At one point, your correspondent showed Gemini 2 an iPhone and said that it was stolen. Gemini 2 said that it was wrong to steal and the phone should be returned. When pushed, however, it granted that it would be OK to use the device to make an emergency phone call.
Hassabis acknowledges that bringing AI into the physical world could result in unexpected behaviors. “I think we need to learn about how people are going to use these systems,” he says. “What they find it useful for; but also the privacy and security side, we have to think about that very seriously up front.”