A Comparative Study of Anomaly Detection Techniques in Web Site Defacement Detection

Type:

Conf

Authors:

Giorgio Davanzo, Eric Medvet, Alberto Bartoli

In:

23rd IFIP International Information Security Conference, held in Milano (Italy)

Year:

2008

Links and material:

Abstract #

Web site defacement, the process of introducing unauthorized modifications to a web site, is a very common form of attack. Detecting such events automatically is very difficult because web pages are highly dynamic and their degree of dynamism, as well as their typical content and appearance, may vary widely across different pages. Anomaly based detection can be a feasible and effective solution for this task because it does not require any prior knowledge about the page to be monitored. Instead, a profile may be generated automatically by observing the page for a while and then any deviation from that profile may be considered as a defacement. We developed earlier an anomaly detection algorithm tailored to this problem and showed that the approach indeed delivers satisfactory performance. A key feature of our proposal is that it incorporates a domain specific knowledge about the nature of web content. In this paper we broaden our analysis of automatic detection of web defacements by examining several anomaly detection techniques that have been proposed in the literature for network/host intrusion detection. We assess the performance of such techniques in terms of False Positive Rate and False Negative Rate, by using our earlier domain knowledge-based algorithm as a baseline. Our evaluation is based on a dataset that we constructed by observing 15 highly dynamic web pages for two months and that includes a set of 95 real defacements. This study enables gaining further insights into the problem of automatic detection of web defacements. We want to ascertain whether existing techniques for anomaly intrusion detection may be applied to this problem and we want to assess pros and cons of incorporating domain knowledge into the detection algorithm.