Ethical challenges of multimodal AI in military use
Multimodal neural networks are advanced AI models that can simultaneously understand and integrate various types of information, such as text, images, and sound, thereby creating a more accurate representation of reality. This makes the technology highly attractive for military applications, but also raises many ethical concerns.
Advances in artificial intelligence (AI) are proceeding rapidly, and the concept of multimodal models, a form of deep neural networks, has become increasingly prominent in both commercial and military contexts. To provide an overview of the latest research developments, scientists at FOI have published the report, Introduction to Multimodal Models.
The interest in multimodal neural network models stems from their ability to process and interact with multiple types of information simultaneously. For example, explains Edward Tjörnhammar, an FOI researcher and co-author of the report, such a network can understand that an image of a tank corresponds to a textual description of a tank.
“The advantage of multimodal models lies in their ability to evaluate different forms of input simultaneously. This enables them to create a more accurate representation of reality by combining various forms of contextual data,” he says. In turn, this gives multimodal models an enhanced capacity to understand and interact with complex situations, where different senses or streams of information are involved.
Humans operate multimodally
Commercial AI services that can create text, music, or art based on text instructions are already available. One could say that, by functioning multimodally, the AI system or robot becomes more human-like.
“We humans function multimodally by instinctively combining different senses, such as sight, hearing, touch, and balance. Today, for example, there are multimodal industrial robots that perform advanced and autonomous tasks in controlled environments, but it’s challenging to extend this to military applications,” says Edward Tjörnhammar.
As an example, he mentions the American robotics company, Boston Dynamics, which develops robot dogs for various purposes in industry and construction, such as moving objects in environments dangerous for humans. The New York Fire Department has also acquired two robot dogs. These are designed to operate in more complex environments, although the robot dogs do not use multimodal or deep neural network models.
“For example, if you wanted to create something similar to the robot dogs, but multimodal, to navigate unknown terrain on a battlefield, it immediately becomes much more complicated. The robot would need to understand the terrain, topology, and environment to make autonomous decisions based on that context. This could be achieved with a multimodal model, but today they are still too large and resource-intensive.”
From commercial to military applications
It is not uncommon for the armed forces to explore commercial solutions and then adapt them for military purposes. Edward Tjörnhammar points to several areas where military applications of AI based on multimodal models are already in use or soon will be.
“For example, it can be used to analyse satellite images, interpret battlefield sounds, or understand geopositioning, and integrate this data to make real-time decisions,” he continues. “But what happens when we have weapon systems that make their own decisions based on the situation? Or when information systems incorporate more and more autonomous decision points? Ultimately, this is about life-and-death moral and ethical decisions. More autonomous decisions in the decision chain mean fewer moral decisions are made by humans.”
One of the biggest concerns about AI, especially in military applications, is precisely the risk of misuse.
“One could imagine a future where a social media post could trigger a missile attack on a residential building.”
Rapid research advances
This description naturally brings to mind the targeted attacks carried out by Israel in Gaza, Palestine. Edward Tjörnhammar confirms that Israel is at the forefront when it comes to military applications of the latest AI technologies and multimodal models. For example, their AI systems Lavender and Gospel are used for automatic target designation to create designated attack lists, so-called “kill lists.”
“But we don’t know today whether Gospel or Lavender specifically use multimodal models, since we don’t know exactly what their capabilities are. However, research is progressing at a dizzying pace, driven not only by large tech companies investing massively in AI but also by the armed forces of various countries.”
The report indicates that multimodal models, as part of what is commonly referred to as AI, will likely have a significant impact on both our daily lives and the future of defence.
“As long as democratic discourse continues, AI can benefit us all, but we must remain vigilant. AI’s potential in defence is enormous, but, at the same time, we don’t want to reduce the number of moral and ethical decisions that should be made throughout the military organisation,” says Edward Tjörnhammer.