Investigation of support tools and methodology for evaluation of AI methods


  • Sidney Rydström
  • Ronnie Johansson

Publish date: 2023-04-12

Report number: FOI-R--5453--SE

Pages: 42

Written in: Swedish


  • artificial intelligence
  • AI
  • machine learning
  • evaluation
  • benchmarking
  • reproducibility
  • support tools


Development of methods regarding artificial intelligence (AI) takes place at an impressive speed, which entails extensive demands for effective investigation regarding possible benefits and applications. This report concerns the evaluation of AI methods through tools for comparisons, performance measurement, and demonstration. The topic of how to compare and evaluate AI systems is extensive and lacks a universal tool, which is why domain-specific resources are currently required. However, the report identifies useful common denominators, such as reproducibility through, for example, version control, cataloguing of experiments, and streamlined documentation. Considering the area's impressive rate of development, it is crucial to apply tools with good flexibility for users. Benchmarking is usually the term used for evaluating performance through comparisons. The report describes the usage within AI and its subfield machine learning (ML) specifically, apart from tools intended for producing, compiling, and comparing the results. Usually, benchmarking uses data sets, which entails limitations; hence, environment-based evaluation is required, for example, evaluation of reinforcement learning and multi-agent systems. In industry, MLOps is a methodology for developing, distributing and deploying ML models, usually through technology infrastructures called AI/ML platforms. Some parts of the platforms are relevant; however, there are many available alternatives hence the presentation of services for investigating and comparing the platforms.