Semi-automatic Data-driven Web Analysis: Detection of Fabricated Media, Credibility Assessment, and Cyber Threat Monitoring
Publish date: 2022-04-05
Report number: FOI-R--5262--SE
Pages: 101
Written in: Swedish
Keywords:
- data-driven analysis
- intelligence analysis
- intelligence studies
- influence operations
- social media
- web analysis
- text analysis
- image analysis
- AI
- generation
- detection
- machine learning
- neural networks
- language technology
Abstract
Intelligence officers, and other professionals with similar tasks, can potentially find essential information in media, such as text, image, sound, and video from the internet. In semi-automatic data-driven web analysis automatic methods for analysis of web data are used to support human analysts. The development of deep neural networks has led to improvements in most methods for automatic analysis of media. However, with the same techniques it is also relatively easy to create, or generate, for example, images of faces and texts that humans cannot distinguish from images of real faces and texts written by humans. This report contains investigations of some automatic methods for detection of generated media. Each piece of information identified, either manually or automatically, needs to be assessed with regard to veracity. This report contains investigations of some automatic methods for veracity assessment. Several of these use deep neural networks for text analysis. In addition to the scientific investigations, this report includes a description of prototypes which have been developed over several years. They have been used to verify methods on realistic data, and in demonstrations and trials in cooperation with analysts. One such trial is about monitoring information of cyber threats in text data.