Google has begun its rollout of Gemini Live – an AI voice assistant with a focus on “free-flowing” conversation.
Coming in direct competition to OpenAI’s ChatGPT Voice Mode, Gemini Live is marketed for Android smartphone users performing broad life tasks – such as preparing dinner plans, selecting an outfit, or co-writing a wedding speech.
Where Gemini Live distinguishes itself is in its naturalistic approach to conversation.
Rather than the conventional prompt-response approach of other generative AI platforms, users can interrupt Gemini Live mid-response to quickly change topics or tunnel in on a particular point.
During a markedly early instalment of Google’s annual Made by Google event on Wednesday, the company’s vice president of user experience Jenny BlackBurn gave both niche and common use-cases for the AI assistant – such as “bouncing ideas back and forth” while brainstorming team bonding activities, or acting as a practice partner for a job interview.
“We believe a truly helpful personal AI assistant must do more. It needs to be able to communicate with you conversationally,” Blackburn.
“Sometimes conversation is the best way to work through something complex.”
In a live demonstration, BlackBurn asked Gemini for some “fun” and “educational” ideas on how to entertain her niece and nephew over the weekend.
Using a surprisingly personable tone, Gemini conversed with Blackburn in real time as it suggested an invisible ink project for the kids, spitballed some ideas for a project name, and answered Blackburn’s follow-up questions about how to avoiding making a mess.
All-in-all, the ‘conversation’ lasted less than two minutes.
“You can have a free-flowing conversation with Gemini,” said Blackburn.
“You can interrupt when you think of something important, or change topic as the conversation flows.”
The basic ChatGPT-like version of Gemini is available for free from the Play Store.
Similar to ChatGPT Voice Mode, Gemini Live comes with a choice of ten distinct voice models – though none of them seem to have been trained to the likeness of Scarlett Johansson.
A (full-stack) sidekick in your pocket
Google describes Gemini Live as akin to having a “sidekick in your pocket”, while senior vice-president of Google devices and services, Rick Osterloh, teased the AI assistant will soon be able to conduct full-blown research.
Despite AI’s propensity for inaccuracy and outright hallucinations, Google is confident Gemini (formerly known as Bard) can create research reports – with Osterloh giving an example of it researching how to open a new cafe.
This example saw Gemini create a multi-step research plan, in which it crawled web pages, analysed collected information, and created a well-presented research document in Google Docs – which included information on local regulation and permitting requirements.
Osterloh further teased a HAL 9000-like camera feature, where users can (in real time) show Gemini a subject of questioning with their phone camera – such as maths homework or unassembled furniture.
Osterloh also announced Gemini 1.5 Pro – the company’s flagship multimodal AI model – can be integrated across the gamut of Google’s key products, from office suite Google Workspace, through to Google Chrome, YouTube and Gmail.
“We’re fully in the Gemini era, with AI infused in almost everything we’re doing at Google – across our full tech-stack,” said Osterloh.
Given Google’s unparalleled data collection – much of which is collated from personal user information – Osterloh pens Gemini as a highly personal, “agentive” assistant which can “tackle complex problems with advanced reasoning, planning, and memory”.
Gemini 'on every phone'
Google has massive ambitions for the new model, with Osterloh stating the tech giant will “keep building responsibly and pushing to make sure Gemini is available to everyone, on every phone”.
“Of course, this starts with Android,” said Osterloh.
Osterloh described Google’s ambitions to put Gemini “right at the core of Android” – the world’s most popular operating system – and deliver the AI as a “breakthrough mobile experiences to billions of people”.
Meanwhile, Google’s latest iteration of its smartphone – the Pixel 9 series – will get a range of Gemini exclusives.
Backed by Google’s AI-tuned Google Tensor G4 chip, the tech giants new foldable smartphone will ship with an AI-powered weather app, a beefy AI image generator and editing software, an “AI-powered” camera, an AI tool which catalogues and recalls details from user screenshots, and, most notably, an AI “Call Assist” feature.
Call Assist creates full transcripts from user phone calls, then generates AI summaries of key takeaways of the call – such as phone numbers, times, details and call members.
Google claims the feature is “completely private”, running entirely on-device so that phone calls and summaries aren’t sent to or stored in the cloud.
Such privacy is reportedly achieved through the use of Gemini Nano – an on-device AI being shipped with the new Pixel series.
Nano is aimed at handling more sensitive, privacy-critical tasks such as Call Assist, with Google noting it can operate without a network connection.
These new Gemini features, including Gemini Live, started rolling out yesterday for users on Pixel, Samsung and other Android phones – the catch is you’ll need a Gemini Advanced subscription which comes included in the Google One AI Premium Plan at $32.99 per month, or as part of a Pixel Pro 9 purchase.
At the time of writing, Gemini Live is not widely available in Australia, and confused Gemini Advanced users are congregating on discussions site Reddit for signs of the tool hitting their demographic.