How military AI systems can be attacked and misled
The use of AI systems in military contexts is becoming increasingly interesting for the Swedish Armed Forces. But adversaries may exploit vulnerabilities in the systems. A group of researchers at FOI has investigated what risks exist and how great they are.
Adversarial machine learning is a research area that focuses on methods for exploiting weaknesses in AI systems that use machine learning. This occurs when a computer is fed large amounts of data and develops rules to solve tasks on its own, without having been programmed to solve that particular type of task. Machine learning is a term that today is used almost synonymously with AI.
In the FOI report, Attacking and Deceiving Military AI Systems, seven researchers review the state of research in adversarial machine learning. They present three case studies that show how different types of AI systems can be attacked. The study was commissioned by the Swedish Armed Forces.
“The Swedish Armed Forces are interested in using machine-learning systems, and therefore want to be able to assess the risk that adversaries will exploit vulnerabilities in the systems, in order to mislead, confuse, or extract information from them,” says Björn Pelzer, researcher at FOI’s Cyber Defence and C2 Technology Division and one of the authors of the report.
Poisoned training data tricks a detector
In the first case study, the researchers investigated the possibilities for poisoning image-classifying systems.
“The machines’ systems are first trained on a huge amount of data. If you want an image detector to learn to recognise tanks, for example, you feed them thousands of pictures of tanks, as well as pictures of other vehicles, so that they learn to distinguish between tanks and cars,” says Björn Pelzer.
Although hacking an adversary’s AI system is difficult, one way to attack the system is to influence the training data that the adversary uses.
“Training data can be manipulated, or poisoned, in the hope that the adversary will use it. We did tests where we poisoned images used for training so that some tanks were incorrectly classified as cars,” says Björn Pelzer.
On the internet are large collections of images and other data, which anyone can upload, for training machine-learning systems.
The researchers concluded that the method works well as long as you know approximately what type of machine-learning system you are targeting.
“The big question, rather, is how to get the adversary to use the data you have poisoned,” says Björn Pelzer.
Possible to obtain secret text
In case number two, the researchers tested whether it was possible to extract secret information from large, generative language models. The models are trained on millions of texts and thus learn to see connections, so that they can calculate what the next word is likely to be and thereby write increasingly longer texts. A well-known example of a generative language model is Chat GPT.
The training texts are not saved in the models, but they can store probabilities that are so unambiguous that it is still possible to obtain texts that correspond to training texts, explains Björn Pelzer.
“We trained a model on about 170,000 texts, and in about 20 percent of the cases it was possible to restore the texts. If you don’t train the model as extremely as we did, it might be possible to extract five percent. So it’s a risk to be aware of,” says Björn Pelzer.
The final case study focused on machine-learning models that have a kind of reward function, where they get to go through scenarios and make decisions. For example, a drone with this type of AI model can decide whether to fly to the right or left, and is trained by the result that occurs. An adversary can confuse the drone by sending its own drone that behaves confusingly, so-called adversarial policy.
Confused virtual robot
The researchers tested the method by having virtual robots fight against each other.
“One of them has to try to get past the other by being pushed. By having one robot lie down on the ground and wave its arms, we managed to confuse the other robot so much that it didn’t know what it was supposed to do and didn’t get past.”
The method does not work on humans, Björn Pelzer points out, but on the other hand works surprisingly well on AI models.
At present, there is little evidence that the attacks the researchers studied are being used in practice. But they fear that this type of thing will become a problem in the future, as more and more AI systems are used.
“All machine-learning systems have risks and vulnerabilities. Especially in the defence context you have to think about this and decide to what extent these systems can be trusted. It’s not enough to have a strong AI model,” says Björn Pelzer.
FOI will eventually publish a follow-up report focusing on how to defend against attacks and the deception of AI models.