Evaluation of Autoclass C


  • Karresand Martin
  • Nordqvist Dan

Publish date: 2004-01-01

Report number: FOI-R--1484--SE

Pages: 40

Written in: Swedish


This report presents an evaluation of Autoclass C, which is based on Bayes´ theorem. The theorem has, among other things, been used to successfully detect spam mail. For the evaluation some of the data sets from the DARPA 98 collection were used to train and test Autoclass C. Also some locally produced data sets were used for the test phase. These were collected in the Information Warfare laboratory at FOI, Linköping. The algorithm turned out to be rather slow when being trained, but compensated with being fast when performing the actual classification of new and unknown data. To be able to compare the results from the different data sets Snort version 2.0.5 was used as a key. The number of packets correctly classified by Autoclass C were recorded. When the DARPA 98 data sets were used both for training and testing the result was satisfying, but when the locally collected data sets from 2004 were used, the result was unsatisfactory.