From Unstructured Text to Information - Knowledge Graphs and RAG for Analysis of News Flows
Publish date: 2025-10-02
Report number: FOI-R--5719--SE
Pages: 61
Written in: Swedish
Keywords:
- Knowledge Graph
- RAG
- Retrieval Augmented Generation
- Information Extraction
- Entity Resolution
- Language Models
- LLM 4
Abstract
Openly published news articles are a source of intelligence information. A summary of what is published on a topic in national or international media can provide a good starting point for further in-depth analysis. However, the amount of news content available online is too large for manual processing. Software tools that can automate parts of the workflow are required. Two methods that could be used are semi-automatically generated knowledge graphs and Retrieval Augmented Generation (RAG). In knowledge graphs, the content from news articles is stored explicitly on a more or less structured graph format, which facilitates generation of statistics and exploration of how entities are connected to each other. The method, however, first requires the relevant information to be extracted from the source texts and inserted in the knowledge graph. RAG instead enables users to interact directly with the news articles through a chat interface. The system receives natural language questions as input and generates answers (also in natural language), based on the available news articles. This report presents the two methods and describes the work of implementing them in practice. A conclusion from the work is that large language models can be used to efficiently adapt an existing knowledge graph to a new domain or application. In addition, small scale testing of the implemented prototypes has shown that semi-automatically generated knowledge graphs and RAG, due to their different strengths and weaknesses, may be able to complement each other when used for text analysis. The work described in this report is part of a more comprehensive study, which aims to investigate the usefulness of the two methods when used as analysis tools in an intelligence context. The implementation of the methods was based on a dataset consisting of news texts covering some events in the Ukraine war.