AI data poisoning explained

Apple’s embrace of generative AI (GenAI) technology will see hundreds of millions of people trusting it with personal information by year’s end – but as cyber criminals use AI data poisoning attacks to distort the technology’s output, be careful what you trust.

What is AI data poisoning?

Data poisoning is a type of attack in which malicious actors exploit vulnerabilities in applications to insert fake data, or change or delete existing data.

It has been used to corrupt databases for many years, but in the era of GenAI it takes on new meaning because GenAI models rely exclusively on the data they have been trained on.

It’s important to note that, unlike many cyber attacks, AI data poisoning does not crash an AI system or prevent it from operating; rather, it is designed to subtly change the outputs of that model in ways that may not always be obvious to users – especially when those users are relying on the system to provide factual or analytical information.

Since GenAI systems’ output is shaped by the data they are trained on, such vulnerabilities could be abused to sow inaccurate or biased information about a particular individual or historical event, or to lace the system with abusive or discriminatory language that researchers are already struggling to eliminate without overcompensating.

How can cyber criminals make it happen?

Applications are meant to check the data they are given to ensure it is clean, reliable, and comes from an authorised source.

However, companies are notoriously bad at ensuring the quality of the data they collect and use: individual departments often maintain their own versions of the same data, which is why companies often talk about installing one central system providing ‘one version of the truth’ or ‘the source of truth’.

Techniques for AI data poisoning are still evolving, but researchers at security firm Synopsys recently discovered one very real vulnerability in EmbedAI, a tool that allows companies to make their own versions of ChatGPT.

By using a type of attack called a cross site request forgery (CSRF), the US National Institute of Science and Technology (NIST) warns that cyber criminals can “deceive [EmbedAI users] into inadvertently uploading and integrating incorrect data into the application’s language model.”

CSRF attacks happen when cyber criminals piggyback on a user’s legitimate access – for example, if you’re logged into your Internet banking and then click a link in a phishing email that abuses the fact that you’re logged in to automatically steal funds from your accounts.

This vulnerability would work the same – using malware to insert inaccurate or misleading information that a legitimate user of EmbedAI is using to train their own company’s model.

Unless the poisoning attack is detected, that data will be integrated into the GenAI system’s model and will be used to distort its later outputs.

Although GenAI tools are well known to ‘hallucinate’ incorrect information, the prospect that they could be intentionally and maliciously misled bodes poorly for systems that – as was recently seen when Google’s AI Overviews feature advised users to make glue pizza after accepting verbatim a flippant Reddit comment made 11 years ago – often can’t tell, or don’t care, when they are presenting fiction as fact.

Why would anyone want to poison an AI model?

Cyber criminals have long shown their willingness to abuse any vulnerability in any technology they might find – but AI data poisoning promises to help cyber criminals sow havoc, particularly as companies deploying GenAI technology for business purposes embrace purpose-built systems trained on their own data, which theoretically promise better answers with fewer hallucinations.

Gartner, for one, expects over 80 per cent of software vendors will embed GenAI by 2026 – and that’s where AI data poisoning becomes truly problematic: while ChatGPT, Microsoft Copilot, Google Gemini and other large GenAI tools are trained on the entire Internet, corporate GenAI systems use smaller data sets that could be easier to pervert.

A poisoned customer-support chatbot, for example, could be retrained to be rude or downright abusive to customers – causing brand damage – or could be fed incorrect technical support information (think ‘To resolve this bug, you must reformat your hard drive and reinstall your operating system’) that could cause major problems.

AI data poisoning would also be invaluable for the many malicious actors that are running ongoing campaigns of disinformation, misinformation, and malign influence (DMMI) that they hope will distort election results or promote particular agendas – something so common that the European Union is pursuing Meta for not doing enough to prevent “malicious actors” including a Russian influence campaign.

Other potential uses for AI data poisoning aren’t hard to imagine: a malicious actor could, for example, target an individual’s AI assistant with data meant to steer their decision making; trick a cyber security GenAI model into ignoring a particular type of cyber attack; or facilitate corporate espionage by distorting a rival company’s GenAI platform in a way that compromises years of R&D.

But isn’t this mostly theoretical?

Not exactly.

Earlier this year, for example, security researchers found around 100 GenAI models on GenAI platform Hugging Face that had been designed to inject malicious code onto users’ machines – and more recently, Hugging Face urged users to update their security after hackers breached its security and accessed authentication secrets on its Spaces AI repository.

The sheer ubiquity of GenAI means AI data poisoning will become an even bigger risk as companies adopt new GenAI tools for increasingly important purposes – such as the AI bot recently appointed as a board advisor by the Real Estate Institute of NSW – but they particularly threaten companies that are tapping generative AI to help software developers write and maintain business critical applications.

“AI data poisoning is the absolute concern of organisations who are using internal data sources to train their LLMs,” Phillip Ivancic, APAC head of solutions strategy at Synopsys Software Integrity Group – whose security researchers discovered the EmbedAI vulnerability – told Information Age.

As well as preventing “inadvertent or malicious insiders just getting access to the models,” Ivancic explained, companies training custom AI models need to be careful of “finding themselves in a position where they’ve got malicious code permutating through their LLM – and potentially starting to cause damage at scale.”

Imagine a cyber criminal tricking an LLM into embedding malware into the code it writes: unless a human spots the malware, businesses would be kicking own goals as their own applications install malware that allows cyber criminals to walk into their networks.

Yet software engineers haven’t generally been trained to review the code produced by GenAI systems, Ivancic noted, meaning that software development teams “will have to give their engineering cohort new muscle memory to look at code recommended by a LLM.”

How can I protect myself against data poisoning?

In the short term, fight the urge to take GenAI’s outputs as gospel.

Particularly on large GenAI systems, the technology is still prone to inaccuracies – so make sure you keep your scepticism intact, and double-check any information that doesn’t sound right, or is related to an important business or personal decision you are going to make.

The new Apple Intelligence capabilities aim to preserve the security and privacy of GenAI by ensuring that most data stays on your phone – including clearly labelling information sourced from ChatGPT – and using a capability called Private Cloud Compute to ensure that your data remains intact.