Algorithmic ‘vaccine’ could prevent AI manipulation

Machine learning (ML) techniques are teaching a new generation of autonomous vehicles to read and obey street signs – but what would happen if a malicious person retrained a self-driving car to believe that a red stop sign is really a red-rimmed speed-limit sign?

The risk of malicious manipulation of ML algorithms is a “very big, very important problem to be solved very rapidly”, artificial intelligence researcher Dr Richard Nock told Information Age.

“Machine learning is growing to scale in society and the algorithms have access to data problems and sources that were previously out of reach for other techniques.”

Nock, who is machine learning group leader with CSIRO’s Data61 arm, has worked with his team to develop a form of mathematical ‘vaccine’ that he believes will help protect artificial intelligence (AI) systems from vulnerabilities in their core algorithms.

Humans have watched in equal parts amazement and fear as AI tackles everything from smart workplace automation to prostate cancer management and creating valuable art.

Others caution about issues of authenticity, regulation, and the need to manage expectations from a technology that is being rapidly integrated into all kinds of new products.

These and other emerging issues have driven extensive consultation around AI ethics, with Standards Australia recently launching a discussion paper around the potential value of Australian and global AI standards.

These and other emerging issues with AI have driven extensive research into ways of improving the predictability of ML models, and protecting them from incorrect learnings that could one day have destructive or even fatal consequences.

“We need to make sure that the algorithms are more robust in their decisions,” Nock explained, “so that when the input is changed a little bit – in a way that would be noticeable to a human – we need to make sure it is also noticeable to a machine.”

Anticipating an adversary

The new model – which was presented this month at the International Conference on Machine Learning in LA – describes a framework that uses a number of techniques to evaluate the ‘harmfulness’ of data that is fed to a ML algorithm.

Such algorithms are ‘trained’ using a large set of photos or other data, which is analysed and grouped into categories based on differences that the algorithm perceives between the data.

This has led to a growing body of unexpected behaviour – such as when an algorithm for detecting melanomas was found to be detecting rulers instead, or when the manipulation of Microsoft’s Tay AI chatbot led to its being pulled after it posted thousands of inappropriate tweets.

Unfettered learning may also struggle to identify biases in data that can produce biased AI models – such as Amazon learned during a high-profile incident in which a recruiting tool was found to be biased against women because of the way it was trained.

Such outcomes have emerged because algorithms are designed to find patterns in all the data they are fed – but the Data61 team’s ‘adversarial training’ mechanism, which could be built into AI models, has been designed to make the algorithms more aware of the quality and consistency of the data they are fed.

In developing the algorithm “we just tried to imagine what an adversary could do,” Nock explained, likening the process to the way that data-security firms use mathematical proofs to explore ways that encryption mechanisms could be compromised.

“We modelled it in mathematics and published the most general design for an adversary – so we can make sure the algorithm is going to be resilient against this kind of adversary.”

Teaching self-defence

Anticipating every type of potential manipulation is difficult, but by describing deviant algorithmic behaviour in mathematics the team believes it can prevent errant, malicious data from skewing the learning models that are created.

The method will enable ML models to evaluate the predictions they make based on data fed to date; if it violates rules or parameters around their potential behaviours, the algorithms can automatically correct themselves.

By developing data sets deliberately designed to mislead the algorithm, unintended outcomes can be evaluated – and the algorithm taught to spot those outcomes rather than act on them.

“We will ask the computer,” Nock explained, “to train itself – not on the initial data, which may be easy to classify, but on classified data – which is much harder because it was designed to fool the machine.”

“If we do that in a very specific way, the machine is going to be more robust.”

As humans increasingly depend on ML for all manner of applications, issues of fairness and accuracy will be crucial – and Nock believes the framework his team has developed will provide a concrete step towards helping AI monitor its own expansion.

“You can impose on your algorithm to be fair even when it is trained on biased data,” Nock explained, “or you can design the system to be fair from the beginning, so you don’t have to add it afterwards.”

“I am absolutely certain that, for many of these kinds of problems, there will soon be concrete solutions.”