Large Language Models in Defence: Challenges and Opportunities

Authors:

  • Farzad Kamrani
  • Linus Kanestad
  • Christoffer Limér
  • Björn Pelzer
  • Iza Smedberg
  • Agnes Tegen
  • Ulrika Wickenberg Bolin

Publish date: 2024-06-05

Report number: FOI-R--5544--SE

Pages: 68

Written in: English

Keywords:

  • artificial intelligence
  • large language models
  • fine-tuning
  • Parameter-Efficient Fine-Tuning
  • Low Rank Adaptation

Abstract

Large language models (LLMs) are being hailed as a major breakthrough in artificial intelligence. Their ability to process and produce texts on a level typically associated with human cognition gives them an enormous potential for applications across all sectors, including defense. At the same time this new technology comes with many open questions regarding its robustness and reliability, and any organization wishing to utilize LLMs faces significant technical challenges. This report aims to demonstrate how LLMs can be trained to adapt them to a Swedish defense domain, and to evaluate whether such a project can be worth the effort. Towards this aim, a dataset based on Swedish and English texts from a defense domain is created and then used to train (fine-tune) two state-of-the-art LLMs. These models are then evaluated both qualitatively and quantitatively. The results show that the LLMs benefit from the training, exhibiting improved performance on textual tasks pertaining to Swedish defense. The detailed description of the training process can also serve as a guide for readers interested in pursuing a similar project. The hurdles in training are largely related to resource constraints, such as hardware, data and time - which may make them difficult to overcome, but at least they are relatively well understood. The same cannot be said of the evaluation of LLMs: The models have surprising capabilities, but they can also fail in surprising ways. There is no established method of testing LLMs thoroughly and objectively, and the evaluation in the report tackles the subject by testing different aspects of LLMs, but it can only scratch the surface. In conclusion, large language models have reached a stage where defense stakeholders can and arguably should begin to adapt and test the technology, and the report can help in this by providing insights into pitfalls, solutions and lessons learnt. At the same time a level-headed approach to LLMs is recommended, as the evaluation of such models must still be regarded as an open question.