Semi-automatic Data-driven Web Analysis: Detection of Fabricated Media, Credibility Assessment, and Cyber Threat Monitoring

Authors:

  • Magnus Rosell
  • Sebastian Bay
  • Ulrika Wickenberg Bolin
  • Marianela Garcia Lozano
  • David Gustafsson
  • Fredrik Johansson
  • Andreas Horndahl
  • Maja Karasalo
  • Hanna Lilja
  • Lukas Lundmark
  • Johan Sabel
  • Harald Stiff
  • Erik Valldor

Publish date: 2022-04-05

Report number: FOI-R--5262--SE

Pages: 101

Written in: Swedish

Keywords:

  • data-driven analysis
  • intelligence analysis
  • intelligence studies
  • influence operations
  • social media
  • web analysis
  • text analysis
  • image analysis
  • AI
  • generation
  • detection
  • machine learning
  • neural networks
  • language technology

Abstract

Intelligence officers, and other professionals with similar tasks, can potentially find essential information in media, such as text, image, sound, and video from the internet. In semi-automatic data-driven web analysis automatic methods for analysis of web data are used to support human analysts. The development of deep neural networks has led to improvements in most methods for automatic analysis of media. However, with the same techniques it is also relatively easy to create, or generate, for example, images of faces and texts that humans cannot distinguish from images of real faces and texts written by humans. This report contains investigations of some automatic methods for detection of generated media. Each piece of information identified, either manually or automatically, needs to be assessed with regard to veracity. This report contains investigations of some automatic methods for veracity assessment. Several of these use deep neural networks for text analysis. In addition to the scientific investigations, this report includes a description of prototypes which have been developed over several years. They have been used to verify methods on realistic data, and in demonstrations and trials in cooperation with analysts. One such trial is about monitoring information of cyber threats in text data.