AI support in target classification skewed operator judgement
AI can assist in recognising targets in imagery from Unmanned Aerial Vehicles (UAVs), but the target classification process is often unclear. Researchers at FOI tested providing operators with a tool that reveals the basis for AI assessments. However, this support had unexpected effects on the operators’ own judgement.
On behalf of the Swedish Armed Forces, a FOI research group investigated how operators could benefit from AI to classify targets in images from Unmanned Aerial Vehicles, so-called UAVs or drones. A promising technique is Deep Neural Networks (DNNs). Neural networks are a type of machine-learning model that mimics the human brain. Deep neural networks have multiple layers of artificial neurons that process information, allowing the model to learn from experience, such as training on large image datasets.
“Neural networks perform very well, at least if you have sufficient high-quality data to train them on. The problem is that it is very difficult to understand exactly what leads to a certain prediction,” says Peter Svenmarck, senior scientist in FOI’s Cyber Defence and C2 Technology Division.
Along with colleagues Ulrika Wickenberg Bolin, Daniel Oskarsson, and Roger Woltjer, he co-authored the report titled, Evaluation of Target Classification Performance of UAV Imagery with RISE Saliency Map Explanations.
Explaining the neural network’s predictions
Without understanding how a neural network arrives at a specific prediction, it is difficult for human users to know if it is reasonable. This has led to the emergence of the field of Explainable AI, or XAI, where methods are developed to help people better understand and assess neural network predictions.
“While there are many proposed methods for explaining why one answer is selected over another, very few are actually assessed by users. There is no formal definition of what constitutes a good explanation,” says Peter Svenmarck.
One method for explaining neural network image classifications is saliency maps. They highlight the features in an image that the neural network is basing its classification on, such as what makes the neural network identify an image as showing a car or a tank.
In the experiment conducted by Peter Svenmarck and his FOI colleagues, 16 participants used UAV images to perform target classification of military vehicles. The participants performed the target classification in three different ways: without support, with support from a deep neural network that suggested what the image showed, and finally with saliency map explanations through a method called RISE, or Randomised Input Sampling for Explanation.
“This method is model-agnostic and can be used regardless of the neural network architecture. It queries the model by covering certain parts of the image. It can then see which areas the model is most sensitive to, and these areas are shown to the user as particularly important,” says Peter Svenmarck.
More support decreased the accuracy
The researchers expected that the participants would improve their target classification ability when supported by a deep neural network. They assumed participants would be even better with the additional support of RISE, allowing them to better assess when the deep neural network’s classifications were incorrect.
However, this was not the case.
“It was exactly the opposite. The more support the participants received, the worse they performed in target classification. This was mainly because it became difficult for them when the network’s classification was incorrect,” explains Peter Svenmarck.
When participants received support from both the neural network and RISE, they highly trusted the neural network’s classifications. As a result, they often failed to notice when the neural network made incorrect classifications. Conversely, they were more suspicious when the classifications were actually correct.
The participants simply had difficulty assessing how reliable the classifications were. The researchers believe that the participants’ reliance on AI support undermines their own judgement. Such results are not unknown; rather, they fall into a category of issues with Explainable AI known as XAI pitfalls.
“One relies on the classifications, the recommendations received from the neural network, more than one should. And Explainable AI amplifies those effects,” says Peter Svenmarck.
Lack of empirical evaluations
The results don’t necessarily imply that it is a bad idea to use neural networks and saliency map explanations as decision support for target classification, according to Peter Svenmarck. But one must try different methods and see how they work in practice. There is a significant lack of empirical evaluations within XAI.
He refers to the ongoing debate around autonomous weapon systems and whether they should be allowed to select targets independently, without human involvement.
“Many argue that a human must be involved in making these decisions. But then one has to ensure that they actually are better.”
Other methods than RISE saliency map explanations are also interesting to investigate further, according to Peter Svenmarck. For example, concept-based explanations analyse neural networks in detail and calculate exactly which parts are most relevant to the predictions. The researchers are also interested in text-based explanations.
Peter Svenmarck and his research colleagues will continue their work on explaining neural network classifications for at least two more years.
“We also want to study explanations for large language models, which are increasingly being used in military decision support.”