On the Effects of Learning Set Corruption in Anomaly-based Detection of Web Defacements




Eric Medvet, Alberto Bartoli


4th International Conference on Detection of Intrusion & Malware and Vulnerability Assessment (DIMVA), held in Lucerne (Switzerland)



Links and material:

Abstract #

Anomaly detection is a commonly used approach for constructing intrusion detection systems. A key requirement is that the data used for building the resource profile are indeed attack-free, but this issue is often skipped or taken for granted. In this work we consider the problem of corruption in the learning data, with respect to a specific detection system, i.e., a web site integrity checker. We used corrupted learning sets and observed their impact on performance (in terms of false positives and false negatives). This analysis enabled us to gain important insights into this rather unexplored issue. Based on this analysis we also present a procedure for detecting whether a learning set is corrupted. We evaluated the performance of our proposal and obtained very good results up to a corruption rate close to 50%. Our experiments are based on collections of real data and consider three different flavors of anomaly detection.