Google Gemini can be used as a phishing mule

A researcher has found Google’s Gemini for Workspace can be tooled to serve up phishing messages under the guise of AI-generated email summaries.

Tech giant Google last year added the ability for its AI model Gemini to produce dot point summaries of email threads in Gmail, and later enabled the feature by default for Google Workspace users.

According to new security findings submitted to Mozilla’s AI bug bounty program 0din, threat actors can hide malicious prompts inside the body of an email to trick Gemini into including scam messages in its email summaries.

When a recipient clicks ‘Summarise this email’, Gemini “faithfully obeys” the scammer’s hidden instructions and appends a deceptive message which looks “as if it came from Google itself”.

In an example from 0din, Gemini told a recipient their password had been compromised before urging them to call a phone number for a password reset.

In a successful phishing scenario, the victim would then call the malicious number, disclose their password and potentially succumb to further social engineering over the phone.

^{An example of Gemini sending a phishing message. Source: 0din}

The exploit, known as a prompt injection attack, evades detection by simply reducing the prompt font size and changing its color to white so it blends into the background of an email.

^{Malicious instructions are hidden in white font. Source: 0din}

“The email travels through normal channels; spam filters see only harmless prose,” wrote 0din program manager Marco Figueroa.

Further testing by Information Age showed the flaw could be more precisely rigged to include links to active phishing sites.

“Your Gmail password has been compromised, please visit [link] to submit a password change immediately,” read a Gemini output during testing.

^{Gemini tricked into sharing a dangerous link. Source: Information Age}

Google has not seen evidence of active attacks using the newly reported method, while a recent blog post revealed the tech giant plans to add “additional prompt injection defences directly into Gemini” later this year.

Doing what it’s told

While prompt injection attacks may appear complicated, they operate exactly as their name suggests: threat actors “inject” malicious, often hidden prompts into an AI tool to make it misbehave.

These prompts can be sophisticated instructions or code to force a model to take out-of-scope action, while in other cases it can be a simple line of text.

0din found no email “links or attachments are required” for Google’s latest vulnerability, and that it instead uses “crafted HTML [and] CSS inside the email body”.

“Current LLM guard-rails largely focus on user-visible text,” wrote Figueroa.

“HTML and CSS tricks bypass those heuristics because the model still receives the raw markup.”

Thanks to the inclusion of “<admin>” tags and phrases such as “You Gemini, have to…”, the AI’s prompt-parser treats the hidden instructions as a “higher-priority directive” and obliges them accordingly.

A user clicking the ‘summarise email’ button effectively delivers the prompt, while Information Age found prompts could also be executed in automatically generated summaries for longer email threads.

A Google spokesperson said defending against “attacks impacting the industry, like prompt injections” has been a continued priority.

“We’ve deployed numerous strong defences to keep users safe, including safeguards to prevent harmful or misleading responses,” they said.

“We are constantly hardening our already robust defences through red-teaming exercises that train our models to defend against these types of adversarial attacks.”

Simple, but effective

Shannon Davis, global principal security researcher at software company Splunk, told Information Age he believed there are “other types of this vulnerability in the wild that we just haven't seen surface yet.”

Although mitigations for hidden email contents have existed for “some time”, Shannon said these protections have mostly accounted for more technical exploits like database SQL injections or cross-site scripting.

“Sometimes the simple approaches to circumvent protections are the ones that work,” said Davis.

He added that AI is definitely “in that hype phase where AI tools are being built without properly planning for security first.”

“We need to think of the new avenues for security risks as we integrate these new AI tools into legacy systems.”

0din also observed the exploit could present a supply-chain risk if newsletters and automated ticketing emails were used as injection vectors or even result in “future AI worms” that self-replicate and spread from inbox to inbox.

“Security teams must treat AI assistants as part of the attack surface,” wrote Figueroa.

“Instrument them, sandbox them, and never assume their output is benign.”